Pix2Text
- breezedeus/Pix2Text
- MIT
- 国内开发者维护
- 简体中文&英文 使用的 CnOCR, 其他使用的 EasyOCR
- p2t 命令行 https://pix2text.readthedocs.io/zh-cn/stable/command/
- macOS 桌面工具 breezedeus/Pix2Text-Mac
- 参考
- reshape
- 224
- 512
- 768
- 1024
- 1536
- 2048
caution
- table-ocr 可能会错误的识别出 table spanning cell
- 导致一列少了内容
# 目前最高只支持 Python 3.12
# CnSTD>=1.2.1, CnOCR>=2.2.2.1, transformers>=4.37.0
# pip install pix2text[multilingual] # 多语言 - 除了 简体中文和英文以外
pip install pix2text -i https://mirrors.aliyun.com/pypi/simple
# for GPU
pip uninstall onnxruntime
pip install onnxruntime-gpu
# --file-type [pdf|page|text_formula|formula|text]
# -i, --img-file-or-dir TEXT
p2t predict \
-l en,ch_sim --disable-formula --enable-table \
--resized-shape 768 \
--file-type pdf \
-i docs/examples/test-doc.pdf \
-o output-md \
--save-debug-res output-debug
# 启动 HTTP 服务
p2t serve -l en,ch_sim -H 0.0.0.0 -p 8503
curl -X POST \
-F "file_type=page" \
-F "resized_shape=768" \
-F "embed_sep= $,$ " \
-F "isolated_sep=$$\n, \n$$" \
-F "image=@docs/examples/page2.png;type=image/jpeg" \
http://0.0.0.0:8503/pix2text
Notes
- Pix2Text
- #from_config(total_configs:{layout,text_formula,table},enable_table,enable_formula)
- recognize
- recognize_pdf
- recognize_page
- recognize_text
- recognize_text_formula
- TableOCR
- recognize(out_cells,out_objects,out_html,out_csv,out_markdown)
AutoModelForObjectDetection.from_pretrained("$HOME/.pix2text/1.1/table-ocr")
- DocYoloLayoutParser
- layout 识别
- LayoutLMv3LayoutParser
- 之前的 layout 识别
- 数据目录 - data_dir(), root - PIX2TEXT_HOME=$HOME/.pix2text
- model 目录
data_dir()/MODEL_VERSION
例如~/.pix2text/1.1
- layout-docyolo/
- mfd-onnx/
- mfr-onnx/
- table-rec/
- model 目录
- ~/.cnstd - breezedeus/CnSTD - 基于 RapidOCR 集成 PPOCRv4
- CN STD - 中文文本检测
- ~/.cnocr - breezedeus/CnOCR
- CN OCR - 中文文本识别
- PIX2TEXT_DOWNLOAD_SOURCE=HF
from torchvision import transforms
# TableOCR 的输入处理
structure_transform = transforms.Compose(
[
MaxResize(1000),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
)
table-rec
- 表格结构识别
- 模型 breezedeus/pix2text-table-rec
- fork microsoft/table-transformer-structure-recognition-v1.1-all
- ⚠️ 模型完全相同,只是 fork 了仓库
- TATR - Table Transformer
- classes
- table
- table column
- table row
- table column header
- table projected row header
- table spanning cell
- no object
- 参考
- Table Transformer (TATR) microsoft/table-transformer
- https://huggingface.co/microsoft/table-transformer-structure-recognition
CN OCR
# pip install cnocr[ort-gpu]
pip install cnocr[ort-cpu] -i https://mirrors.aliyun.com/pypi/simple