text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

Overview

text-detection-ctpn

Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be found here. Also, the origin repo in caffe can be found in here. For more detail about the paper and code, see this blog. If you got any questions, check the issue first, if the problem persists, open a new issue.


NOTICE: Thanks to banjin-xjy, banjin and I have reonstructed this repo. The old repo was written based on Faster-RCNN, and remains tons of useless code and dependencies, make it hard to understand and maintain. Hence we reonstruct this repo. The old code is saved in branch master


roadmap

  • reonstruct the repo
  • cython nms and bbox utils
  • loss function as referred in paper
  • oriented text connector
  • BLSTM

setup

nms and bbox utils are written in cython, hence you have to build the library first.

cd utils/bbox
chmod +x make.sh
./make.sh

It will generate a nms.so and a bbox.so in current folder.


demo

  • follow setup to build the library
  • download the ckpt file from googl drive or baidu yun
  • put checkpoints_mlt/ in text-detection-ctpn/
  • put your images in data/demo, the results will be saved in data/res, and run demo in the root
python ./main/demo.py

training

prepare data

  • First, download the pre-trained model of VGG net and put it in data/vgg_16.ckpt. you can download it from tensorflow/models
  • Second, download the dataset we prepared from google drive or baidu yun. put the downloaded data in data/dataset/mlt, then start the training.
  • Also, you can prepare your own dataset according to the following steps.
  • Modify the DATA_FOLDER and OUTPUT in utils/prepare/split_label.py according to your dataset. And run split_label.py in the root
python ./utils/prepare/split_label.py
  • it will generate the prepared data in data/dataset/
  • The input file format demo of split_label.py can be found in gt_img_859.txt. And the output file of split_label.py is img_859.txt. A demo image of the prepared data is shown below.


train

Simplely run

python ./main/train.py
  • The model provided in checkpoints_mlt is trained on GTX1070 for 50k iters. It takes about 0.25s per iter. So it will takes about 3.5 hours to finished 50k iterations.

some results

NOTICE: all the photos used below are collected from the internet. If it affects you, please contact me to delete them.


oriented text connector

  • oriented text connector has been implemented, i's working, but still need futher improvement.
  • left figure is the result for DETECT_MODE H, right figure for DETECT_MODE O

Comments
  • How to export model for Tensorflow Serving?

    How to export model for Tensorflow Serving?

    Follow some tutorials of exporting model for Tensorflow Serving, I've come up below trial:

    cfg_from_file('ctpn/text.yml')
    config = tf.ConfigProto(allow_soft_placement=True)
    with tf.Session(config=config) as sess:
            net = get_network("VGGnet_test")
    
            saver = tf.train.Saver()
            try:
                ckpt = tf.train.get_checkpoint_state(cfg.TEST.checkpoints_path)
                saver.restore(sess, ckpt.model_checkpoint_path)
            except:
                raise 'Missing pre-trained model: {}'.format(ckpt.model_checkpoint_path)
    
           # The main export trial is here
           #############################
            export_path = os.path.join(
                tf.compat.as_bytes('/tmp/ctpn'),
                tf.compat.as_bytes(str(1)))
            builder = tf.saved_model.builder.SavedModelBuilder(export_path)
    
            freezing_graph = sess.graph
            prediction_signature = tf.saved_model.signature_def_utils.predict_signature_def(
                inputs={'input': freezing_graph.get_tensor_by_name('Placeholder:0')},
                outputs={'output': freezing_graph.get_tensor_by_name('Placeholder_1:0')}
            )
    
            builder.add_meta_graph_and_variables(
                sess,
                [tf.saved_model.tag_constants.SERVING],
                signature_def_map={
                    tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature
                },
                clear_devices=True)
    
            builder.save()
            print('[INFO] Export SavedModel into {}'.format(export_path))
            #############################
    

    With output of freezing_graph is:

    Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
    Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("rpn_cls_score/Reshape_1:0", shape=(?, ?, ?, 20), dtype=float32)
    Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
    Tensor("Reshape_2:0", shape=(?, ?, ?, 20), dtype=float32)
    Tensor("rpn_bbox_pred/Reshape_1:0", shape=(?, ?, ?, 40), dtype=float32)
    Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
    

    After I got /tmp/ctpn/1 exported model, I try to load into Tensorflow Serving server:

    tensorflow_model_server --port=9000 --model_name=ctpn --model_base_path=/tmp/ctpn
    

    But it came up an error:

    Loading servable: {name: detector version: 1} failed: Not found: Op type not registered 'PyFunc' in binary running on [...]. Make sure the Op and Kernel are registered in the binary running in this process.
    

    So there are 2 questions:

    • Am I right about the inputs (Placeholder:0) and the outputs (Placeholder_1:0) of prediction_signature
    • Where do I miss PyFunc?
    opened by hiepph 20
  • exuse me !!!!do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000iters,it can detect nothing?

    exuse me !!!!do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000iters,it can detect nothing?

    exuse me !!!! do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000 iters,it can detect nothing?

    opened by cjt222 15
  • BiLSTM and Training Time

    BiLSTM and Training Time

    Thanks for sharing your implementation with us. I have implemented CTPN with Caffe which failed to converge when adding LSTM. First, I want to ask whether you have added the BiLSTM in your code or not. I am new to tensorflow. After looking at the code, I think you just implement the LSTM not the BiLSTM, is it right ? Second, I want to ask how long did you train your model? I have run the train script of your programs on a GPU device. It seems that it would take 5-6 days to finish the first 180000 iterations.

    Thanks very much.

    opened by TaoDream 14
  • I try to change code from python2 to python3

    I try to change code from python2 to python3

    I try to change code from python2 to python3,but when I finish all the mistake,and run the code,it caused error below,i do not know where it come from and how to carry out it,how can i do?Thank you so much

    2017-10-26 14:07:03.163176: W tensorflow/core/framework/op_kernel.cc:1158] Unknown: KeyError: b'TEST' Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call return fn(*args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn status, run_metadata) File "/usr/local/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "ctpn/demo.py", line 95, in _, _ = test_ctpn(sess, net, im) File "/root/chengjuntao/text-detection-ctpn/lib/fast_rcnn/test.py", line 171, in test_ctpn rois = sess.run([net.get_output('rois')[0]],feed_dict=feed_dict) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    Caused by op 'rois/PyFunc', defined at: File "ctpn/demo.py", line 85, in net = get_network("VGGnet_test") File "/root/chengjuntao/text-detection-ctpn/lib/networks/factory.py", line 20, in get_network return VGGnet_test() File "/root/chengjuntao/text-detection-ctpn/lib/networks/VGGnet_test.py", line 14, in init self.setup() File "/root/chengjuntao/text-detection-ctpn/lib/networks/VGGnet_test.py", line 68, in setup .proposal_layer(_feat_stride, anchor_scales, 'TEST', name='rois')) File "/root/chengjuntao/text-detection-ctpn/lib/networks/network.py", line 28, in layer_decorated layer_output = op(self, layer_input, *args, **kwargs) File "/root/chengjuntao/text-detection-ctpn/lib/networks/network.py", line 241, in proposal_layer [tf.float32,tf.float32]) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func input=inp, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func name=name) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in init self._traceback = _extract_stack()

    UnknownError (see above for traceback): KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    opened by cjt222 14
  • ModuleNotFoundError: No module named 'lib'

    ModuleNotFoundError: No module named 'lib'

    When I try to execute the demo.py im receiving this error

    File "demo.py", line 9, in from lib.networks.factory import get_network ModuleNotFoundError: No module named 'lib'

    Can someone please help me with this ?

    opened by SuryaprakashNSM 12
  • AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    File "/home/hjq/PycharmProjects/Tensorflow-OCR/text-detection-ctpn-master/ctpn/demo.py", line 103, in raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path) AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    opened by seawater668 11
  • After CTPN. What is your idea?

    After CTPN. What is your idea?

    Hello. eragonruan! I was very impressed with your code. One more time. Thank you so much.

    I have a problem. When the image passes through the CTPN, a green box is created. If I put this image in the OCR engine, how should I separate it (green box)? My idea is to use OpenCV. But the green box is on the text. so it hides the text. Can I draw a thin line to solve the problem?

    opened by rudebono 10
  • win10 on cpu encounter KeyError: b'TEST' error

    win10 on cpu encounter KeyError: b'TEST' error

    @eragonruan My enviroment is win10+cpu+tensorflow1.3. I have spend lot of time for this; I'm confused about this. Please help me on your spare time. Thanks!

    Here is the error: `2018-07-26 11:11:58.723047: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2018-07-26 11:11:58.728604: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32) Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("rpn_cls_score/Reshape_1:0", shape=(?, ?, ?, 20), dtype=float32) Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32) Tensor("Reshape_2:0", shape=(?, ?, ?, 20), dtype=float32) Tensor("rpn_bbox_pred/Reshape_1:0", shape=(?, ?, ?, 40), dtype=float32) Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32) Loading network VGGnet_test... Restoring from checkpoints/VGGnet_fast_rcnn_iter_50000.ckpt... done 2018-07-26 11:12:09.006369: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Unknown: KeyError: b'TEST' Traceback (most recent call last): File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call return fn(*args) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1306, in _run_fn status, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\contextlib.py", line 88, in exit next(self.gen) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "./ctpn/demo.py", line 97, in _, _ = test_ctpn(sess, net, im) File "D:\workspace\text-detection-ctpn-master\lib\fast_rcnn\test.py", line 51, in test_ctpn rois = sess.run([net.get_output('rois')[0]],feed_dict=feed_dict) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 895, in run run_metadata_ptr) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1321, in _do_run options, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]

    Caused by op 'rois/PyFunc', defined at: File "./ctpn/demo.py", line 82, in net = get_network("VGGnet_test") File "D:\workspace\text-detection-ctpn-master\lib\networks\factory.py", line 8, in get_network return VGGnet_test() File "D:\workspace\text-detection-ctpn-master\lib\networks\VGGnet_test.py", line 15, in init self.setup() File "D:\workspace\text-detection-ctpn-master\lib\networks\VGGnet_test.py", line 56, in setup .proposal_layer(_feat_stride, anchor_scales, 'TEST', name='rois')) File "D:\workspace\text-detection-ctpn-master\lib\networks\network.py", line 21, in layer_decorated layer_output = op(self, layer_input, *args, **kwargs) File "D:\workspace\text-detection-ctpn-master\lib\networks\network.py", line 215, in proposal_layer [tf.float32,tf.float32]) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\ops\script_ops.py", line 203, in py_func input=inp, token=token, Tout=Tout, name=name) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\ops\gen_script_ops.py", line 36, in _py_func name=name) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op op_def=op_def) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

    UnknownError (see above for traceback): KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]`

    opened by Crocodiles 9
  •   UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. This may consume a large amount of memory.

    UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. This may consume a large amount of memory.

    when i try to train my own datasets, i faced this core dump,

    Computing bounding-box regression targets... bbox target means: [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] [ 0. 0. 0. 0.] bbox target stdevs: [[ 0.1 0.1 0.2 0.2] [ 0.1 0.1 0.2 0.2]] [ 0.1 0.1 0.2 0.2] Normalizing targets done Solving... /data/resys/var/python2.7.3/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " Segmentation fault (core dumped)

    this fault caused by the function train_model() in lib/fast-rcnn/train.py , when it run train_op=opt.apply_gradients(list(zip(grads,tvars),global_step=global_step)) .

    have you ever faced this error?

    opened by louisly 8
  • WHERE ARE gt_img_1001.txt ... gt_img_6000.txt FILES?!

    WHERE ARE gt_img_1001.txt ... gt_img_6000.txt FILES?!

    I cloned the project, downloaded the pretrained model and the training data, unzipped them then got a folder named TEXTVOC that contains other folders (Annotations, ImagesSets and JPEGImages). I placed TEXTVOC in the data folder and edited the paths in the split_label.py as following: path = '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/JPEGImages' gt_path = '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/Annotations'. But, when I run it python lib/prepare_training_data/split_label.py, I got this error:

    /home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/JPEGImages/img_1001.jpg
    Traceback (most recent call last):
      File "lib/prepare_training_data/split_label.py", line 34, in <module>
        with open(gt_file, 'r') as f:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/Annotations/gt_img_1001.txt'
    

    As it is shown, the gt_img_1001.txt is not missing, which means whether the gt_path is wrong whether those files do not exist. I looked into all folders and I didn't find any file starting with gt_* PS: I also run this command: ln -s TEXTVOC VOCdevkit2007, so I didn't miss anything that should be done.

    opened by maky-hnou 7
  • how to run the train scripts

    how to run the train scripts

    在split_label.py中有两个路径path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/image'和gt_path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/label',代码中会分别读取这个两个路径下的文件,请问....../label中放什么样的文件?谢谢

    opened by jibadallz 7
  • Bump tensorflow-gpu from 1.4.0 to 2.9.3

    Bump tensorflow-gpu from 1.4.0 to 2.9.3

    Bumps tensorflow-gpu from 1.4.0 to 2.9.3.

    Release notes

    Sourced from tensorflow-gpu's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow-gpu's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.14.2 to 1.22.0

    Bump numpy from 1.14.2 to 1.22.0

    Bumps numpy from 1.14.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Model Quantization

    Model Quantization

    The pretrained model is favorable as it detects text with high accuracy. However, it takes long time to inference with CPU, this is out of expectation in production deployment. Is there a way to quantize the model?

    What I've tried is to convert the checkpoint model to saved_model format, then load from saved_model to perform quantization with TFLite converter, code snippet as followed:

    # Load checkpoint and convert to saved_model
    import tf
    trained_checkpoint_prefix = "checkpoints_mlt/ctpn_50000.ckpt"
    export_dir = "exported_model"
    
    graph = tf.Graph()
    with tf.compat.v1.Session(graph=graph) as sess:
        # Restore from checkpoint
        loader = tf.compat.v1.train.import_meta_graph(trained_checkpoint_prefix + ".meta")
        loader.restore(sess, trained_checkpoint_prefix)
    
    # Export checkpoint to SavedModel
    builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
    builder.add_meta_graph_and_variables(sess,
                                         [tf.saved_model.TRAINING, tf.saved_model.SERVING],
                                         strip_default_attrs=True)
    builder.save()
    

    In result, I got a .pb file and and a variables folder with checkpoint and index files inside. Then errors popped out when I tried to perform quantization:

    converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.int8  # or tf.uint8
    converter.inference_output_type = tf.int8  # or tf.uint8
    tflite_quant_model = converter.convert()
    

    This is the error:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-4-03205673177f> in <module>
         11 converter.inference_input_type = tf.int8  # or tf.uint8
         12 converter.inference_output_type = tf.int8  # or tf.uint8
    ---> 13 tflite_quant_model = converter.convert()
    
    ~/virtualenvironment/tf2/lib/python3.6/site-packages/tensorflow/lite/python/lite.py in convert(self)
        450     # TODO(b/130297984): Add support for converting multiple function.
        451     if len(self._funcs) != 1:
    --> 452       raise ValueError("This converter can only convert a single "
        453                        "ConcreteFunction. Converting multiple functions is "
        454                        "under development.")
    
    ValueError: This converter can only convert a single ConcreteFunction. Converting multiple functions is under development.
    

    Understand that this error was raised due to the multiple inputs input_image and input_im_info required by the model. Appreciate if anyone could help.

    opened by leeshien 0
  • Bump ipython from 5.1.0 to 7.16.3

    Bump ipython from 5.1.0 to 7.16.3

    Bumps ipython from 5.1.0 to 7.16.3.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(untagged-48d74c6337a71b6b5f87)
Owner
Shaohui Ruan
Interested in machine learning & computer vision
Shaohui Ruan
Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Handwritten Text Recognition with TensorFlow Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows Up

Harald Scheidl 1.5k Jan 07, 2023
Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

Yuntian Deng 1.1k Jan 02, 2023
huoyijie 1.2k Dec 29, 2022
かの有名なあの東方二次創作ソング、「bad apple!」のMVをPythonでやってみたって話

bad apple!! 内容 このプログラムは、bad apple!(feat. nomico)のPVをPythonを用いて再現しよう!という内容です。 実はYoutube並びにGithub上に似たようなプログラムがあったしなんならそっちの方が結構良かったりするんですが、一応公開しますw 使い方 こ

赤紫 8 Jan 05, 2023
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

81 Dec 01, 2022
Neural search engine for AI papers

Papers search Neural search engine for ML papers. Demo Usage is simple: input an abstract, get the matching papers. The following demo also showcases

Giancarlo Fissore 44 Dec 24, 2022
Erosion and dialation using structure element in OpenCV python

Erosion and dialation using structure element in OpenCV python

Tamzid hasan 2 Nov 11, 2021
Automatically remove the mosaics in images and videos, or add mosaics to them.

Automatically remove the mosaics in images and videos, or add mosaics to them.

Hypo 1.4k Dec 30, 2022
Python rubik's cube solver

This program makes a 3D representation of a rubiks cube and solves it step by step.

Pablo QB 4 May 29, 2022
Deep Learning Chinese Word Segment

引用 本项目模型BiLSTM+CRF参考论文:http://www.aclweb.org/anthology/N16-1030 ,IDCNN+CRF参考论文:https://arxiv.org/abs/1702.02098 构建 安装好bazel代码构建工具,安装好tensorflow(目前本项目需

2.1k Dec 23, 2022
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

CUTIE TensorFlow implementation of the paper "CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor." Xiaohu

Zhao,Xiaohui 147 Dec 20, 2022
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
CNN+LSTM+CTC based OCR implemented using tensorflow.

CNN_LSTM_CTC_Tensorflow CNN+LSTM+CTC based OCR(Optical Character Recognition) implemented using tensorflow. Note: there is No restriction on the numbe

Watson Yang 356 Dec 08, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

Bailando Code for CVPR 2022 (oral) paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory" [Paper] | [Project Page] | [Vi

Li Siyao 237 Dec 29, 2022
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 06, 2022
(CVPR 2021) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection

ST3D Code release for the paper ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, CVPR 2021 Authors: Jihan Yang*, Shaoshu

CVMI Lab 224 Dec 28, 2022