paddlepaddle-gpu3.0.0进行ocr训练

1、服务器中实用NVIDIA A100并且装有cuda 12.4版本，而paddlepaddle-gpu比较接近时cuda 12.3版本。格式分布如上，图片数据：data_dir+第一个数据。

li三河

348人浏览 · 2025-12-27 20:17:23

li三河 · 2025-12-27 20:17:23 发布

1、服务器中实用NVIDIA A100并且装有cuda 12.4版本，而paddlepaddle-gpu比较接近时cuda 12.3版本（3.0.0 beta1以下版本，在export_model导出，才有.pdmodel产生，对于低版本才能用）。

https://www.paddlepaddle.org.cn/packages/stable/cu123/paddlepaddle-gpu/

pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/

进行测试：

python -c "import paddle; paddle.utils.run_check()"

2、选择PaddleOCR-release-2.7.1进行ocr中文字符识别：

Eval:
  dataset:
    name: SimpleDataSet           # 数据集类型
    data_dir: /home/xxx/data/ocr/  # 数据根目录
    label_file_list:              # 验证集标签文件
      - /home/xxx/data/ocr/crop_sign_1013/val_list.txt

格式分布如上，图片数据：data_dir+第一个数据

进一步需要添加预训练权重，进行下载权重（在别人家成果上进行finetune）

https://www.paddleocr.ai/latest/version3.x/pipeline_usage/OCR.html#1-ocr

pretrained_model: ./signedname/PP-OCRv4_server_rec_pretrained.pdparams
character_dict_path: ./signedname/PP-OCRv4.txt

3、训练

在PaddleOCR-release-2.7.1/doc/doc_ch/recognition.md中写着如何具体操作，

进行单卡训练：

python ./train.py -c ./signedname/rec_svtrnet_ch.yml -o Global.pretrained_model=./signedname/ch_PP-OCRv4_rec_train

多卡训练：

python -m paddle.distributed.launch --gpus '0,1'  ./train.py -c ./signedname/rec_svtrnet_ch.yml

导出参数

python ./tools/export_model.py -c ./signedname/rec_svtrnet_ch.yml -o Global.pretrained_model=./output/rec/svtr_ch_all/best_accuracy  Global.save_inference_dir=./output/rec/svtr_ch_all/inference/

在PaddleOCR-3.3.2中进行数据导出：

python ./tools/export_model.py -c ./signedname/PP-OCRv5_server_rec.yml -o Global.pretrained_model=./output/PP-OCRv5_server_rec/latest.pdparams  Global.save_inference_dir=./output/PP-OCRv5_server_rec/inference/

4、使用ocr中paddleocr5.0进行测试（要先做export导出模型后，再做识别功能）

import time
import os
os.environ['PYTHONIOENCODING'] = 'utf-8'

# 只进行识别
from paddleocr import TextRecognition

# 初始化OCR
model = TextRecognition(
    model_name="PP-OCRv5_server_rec",
    model_dir="./output/PP-OCRv5_server_rec2/inference/"
)

def test_batch_image():
    input_txt='./ocr/train_list.txt'
    output_txt='./ocr/output.txt'

    # 读取训练列表
    with open(input_txt, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    
    error_results = []
    
    for line in lines:
        parts = line.strip().split('\t')
        if len(parts) < 2:
            continue
            
        image_path, gt_text = parts[0], parts[1]

        image_path="./ocr/"+image_path
        
        # 检查图片文件是否存在
        if not os.path.exists(image_path):
            print(f"文件不存在: {image_path}")
            continue
        
        # OCR识别
        result = model.predict(input=image_path, batch_size=1)
        # 获取识别结果
        if result and len(result) > 0:
            rec_text = result[0]['rec_text'] if result[0]['rec_text'] else ""
            print(f"{image_path}\t{gt_text}\t{rec_text}")
            
            # 比较识别结果和真实标签
            if rec_text != gt_text:
                error_results.append(f"{image_path}\t{gt_text}\t{rec_text}")
    
    # 保存错误结果到文件
    if error_results:
        with open(output_txt, 'w', encoding='utf-8') as f:
            f.write("文件名\t标准结果\t错误识别结果\n")
            for error in error_results:
                f.write(error + "\n")
        print(f"错误结果已保存到 error_results.txt，共 {len(error_results)} 条错误记录")
    
    # 输出统计信息
    total_images = len(lines)
    error_count = len(error_results)
    accuracy = (total_images - error_count) / total_images * 100
    
    print(f"\n统计结果:")
    print(f"总图片数: {total_images}")
    print(f"错误识别数: {error_count}")
    print(f"准确率: {accuracy:.2f}%")

if __name__ == '__main__':
    # test_single_image()
    print("\n开始批量测试...")
    test_batch_image()

5、旧版本测试（使用ch_PP-OCRv4_rec）

用以前的老版本paddleocr==2.7.0.3，paddlepaddle-gpu==2.4.2.post117。查看paddleocr.py对图像尺寸已经写死（3，48，320），所以训练时候要注意：

5、PP-OCRv5_server_rec.yml配置文件参数

对于字库删除了部分（日文、韩文、图标等数据），得到中文字典库15000

PP-OCRv5_server_rec.yml中有些价值参数没有释放出来，例如：

例如head中use_pos，可对数据位置编码。

Loss中CTCLoss可以开启focal loss，在权重参数中weight_1和weight_2没有释放出来，在代码默认设置为1。一般需要而言，weight_1*CTCLoss=weight_2*NRTRLoss来设置weight_1和weight_2的值。

Global:
  model_name: PP-OCRv5_server_rec # To use static model for inference.
  debug: false
  use_gpu: true
  epoch_num: 75
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/PP-OCRv5_server_rec1
  save_epoch_step: 5
  eval_batch_step: [0, 2000]
  cal_metric_during_train: true
  calc_epoch_interval: 1
  pretrained_model: ./signedname/PP-OCRv5_server_rec_pretrained.pdparams
  checkpoints:
  save_inference_dir: ./output/PP-OCRv5_server_rec/inference
  use_visualdl: false
  infer_img: doc/imgs_words/ch/word_1.jpg
  character_dict_path: ./signedname/ppocrv5_dict.txt
  max_text_length: &max_text_length 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/rec/predicts_ppocrv5.txt
  d2s_train_image_shape: [3, 48, 320]


Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Cosine
    learning_rate: 0.0005
    warmup_epoch: 5
  regularizer:
    name: L2
    factor: 3.0e-05


Architecture:
  model_type: rec
  algorithm: SVTR_HGNet
  Transform:
  Backbone:
    name: PPHGNetV2_B4
    text_rec: True
  Head:
    name: MultiHead
    use_pos: True    #提升签名任务
    head_list:
      - CTCHead:
          Neck:
            name: svtr
            dims: 120
            depth: 2
            hidden_dims: 120
            kernel_size: [1, 3]
            use_guide: True
          Head:
            fc_decay: 0.00001
      - NRTRHead:
          nrtr_dim: 384
          max_text_length: *max_text_length

Loss:
  name: MultiLoss
  loss_config_list:
    - CTCLoss:
        use_focal_loss: true  # 处理类别不平衡，默认参数false
    - NRTRLoss:
  weight_1: 1.0   #默认参数 1.0
  weight_2: 0.3   #默认参数 1.0

PostProcess:  
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: MultiScaleDataSet
    ds_width: false
    data_dir: /home/xxx/data/ocr/hwge_with_ocr_label
    ext_op_transform_idx: 1
    label_file_list:
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1013/train_list.txt
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1013/test_list.txt
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1016/train_list.txt
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1016/test_list.txt  
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1117s/train_list.txt   
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1117s/test_list.txt 
    - /home/xxx/data/ocr/hwge_with_ocr_label/bmp0926n/train_list.txt 
    - /home/xxx/data/ocr/hwge_with_ocr_label/bmp0926n/test_list.txt 
    - /home/xxx/data/ocr/hwge_with_ocr_label/bmp_231117/train_list.txt 
    - /home/xxx/data/ocr/hwge_with_ocr_label/bmp_231117/test_list.txt 
    - /home/xxx/data/ocr/hwge_with_ocr_label/font_sign/train_list.txt 
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape: [48, 320, 3]
        max_text_length: *max_text_length
    - RecAug:
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  sampler:
    name: MultiScaleSampler
    scales: [[320, 32], [320, 48], [320, 64]]
    first_bs: &bs 350
    fix_bs: false
    divided_factor: [8, 16] # w, h
    is_training: True
  loader:
    shuffle: true
    batch_size_per_card: *bs
    drop_last: true
    num_workers: 16
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: /home/xxx/data/ocr/hwge_with_ocr_label
    label_file_list:
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1013/val_list.txt
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1016/val_list.txt
    - /home/xxx/data/ocr/hwge_with_ocr_label/crop_sign_1117s/val_list.txt
    - /home/xxx/data/ocr/hwge_with_ocr_label/bmp0926n/val_list.txt
    - /home/xxx/data/ocr/hwge_with_ocr_label/bmp_231117/val_list.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: *bs
    num_workers: 12

九章云极普惠算力

更多推荐

Big快速上手：如何用简单的Markdown语法创建专业演示文稿

想要快速制作专业演示文稿却不想学习复杂的软件？Big是专为创意工作者和忙碌开发者设计的极简演示系统，让你告别繁琐配置，专注于内容本身。本文将为你介绍Big的核心功能、快速入门方法以及如何用简单的HTML创建令人印象深刻的演示文稿。## 🚀 什么是Big演示文稿系统？Big是一个轻量级的演示文稿系统，整个系统仅约16KB大小，采用纯HTML+CSS+JavaScript技术栈。它专为创意工