重磅教程】RF-DETR实战完全指南：从入门到精通，手把手教你掌握SOTA实时目标检测

这意味着RF-DETR对各种场景和领域都有出色的适应能力，即使你的数据与COCO差异很大，也能获得良好的性能。2025年3月20日，Roboflow发布了RF-DETR，这是首个在COCO数据集上突破60 mAP的实时模型。RF-DETR的核心创新在于它抛弃了传统CNN架构，采用了基于Transformer的DEtection TRansformer（DETR）架构。特别值得一提的是，RF-DET

datayx

871人浏览 · 2026-03-21 23:39:27

datayx · 2026-03-21 23:39:27 发布

向AI转型的程序员都关注公众号机器学习AI算法工程

2025年3月，目标检测领域迎来了一次重大突破。Roboflow发布了RF-DETR，这是第一个在Microsoft COCO数据集上实现超过60 mAP的实时目标检测模型。本文将带你从零开始，全面掌握RF-DETR的使用方法。

一、为什么RF-DETR值得关注？

1.1 性能突破：重新定义实时检测的边界

想象一下：你正在开发一个实时监控系统，需要在100毫秒内检测画面中的所有目标。传统方案中，你不得不在速度和精度之间做出艰难的取舍——要么牺牲精度换取速度，要么忍受延迟来追求精确检测。

RF-DETR改变了这一切。

2025年3月20日，Roboflow发布了RF-DETR，这是首个在COCO数据集上突破60 mAP的实时模型。简单说，它在保持超快推理速度的同时，达到了前所未有的检测精度。

让我们看看一组令人印象深刻的对比数据：

在相同硬件配置下（NVIDIA T4 GPU）：

RF-DETR Small
：mAP 53.0，延迟3.52ms
YOLO11 Small
：mAP 44.4，延迟3.16ms
YOLO11 Large
：mAP 51.2，延迟11.92ms

这说明什么？更小的RF-DETR模型在精度上大幅超越更大的YOLO模型，同时速度更快。

1.2 技术创新：Transformer架构的优势

RF-DETR的核心创新在于它抛弃了传统CNN架构，采用了基于Transformer的DEtection TRansformer（DETR）架构。这带来了三个关键优势：

1. 端到端训练，告别NMS

传统检测器（如YOLO）依赖非极大值抑制（NMS）来过滤重复检测，这不仅增加了计算开销，还可能误删有效检测。RF-DETR通过匈牙利匹配算法实现端到端训练，完全不需要NMS，既提高了速度又保证了准确性。

2. 更强的泛化能力

RF-DETR基于DINOv2视觉骨干网络，这是一个在1.42亿张图片上训练的自监督模型。这意味着RF-DETR对各种场景和领域都有出色的适应能力，即使你的数据与COCO差异很大，也能获得良好的性能。

3. 灵活的分辨率调整

RF-DETR支持在运行时调整输入分辨率，无需重新训练。这意味着你可以根据实际需求在精度和速度之间动态平衡——需要更高精度时提高分辨率，需要更快速度时降低分辨率。

1.3 应用场景：从研究到生产

RF-DETR的适用场景非常广泛：

工业检测
：PCB板缺陷检测、产品质检、流水线监控
自动驾驶
：行人车辆检测、交通标志识别、障碍物检测
智能安防
：入侵检测、行为分析、异常监控
医疗影像
：病灶检测、细胞计数、医学诊断辅助
农业监测
：作物病害识别、害虫检测、产量评估

特别值得一提的是，RF-DETR在RF100-VL（一个包含100个真实世界数据集的基准测试）上表现出色，证明了它在实际应用中的强大能力。

二、RF-DETR核心概念速览

2.1 什么是RF-DETR？

RF-DETR（Roboflow DETEction TRansformer）是一个基于Transformer架构的实时目标检测模型，由Roboflow开发并于2025年3月开源。

技术栈组成：

DINOv2骨干网络
：提供强大的视觉特征提取能力
Deformable DETR架构
：通过可变形注意力机制实现高效检测
权重共享NAS
：通过神经架构搜索优化精度-延迟平衡

2.2 架构解析：RF-DETR如何工作？

为了帮助你理解RF-DETR的工作原理，我们用一个通俗的类比：

传统检测器（如YOLO） 就像一个拿着放大镜的侦探，在图像的不同位置仔细搜索，可能会错过一些细节，也可能重复搜索同一区域。

RF-DETR就像一个聪明的侦探团队，他们会先快速扫描整个场景，确定哪里需要重点关注，然后集中资源进行精确搜索。这种"注意力机制"让检测既高效又准确。

技术层面：

特征提取
：DINOv2骨干网络提取图像特征
编码器处理
：使用可变形注意力机制处理特征图
解码器预测
：通过对象查询生成边界框和类别预测
损失计算
：使用匈牙利匹配损失进行端到端训练

2.3 RF-DETR vs 其他模型对比

让我们看看RF-DETR与其他主流模型的详细性能对比：

在COCO数据集上的表现：

表格

模型	参数量	COCO mAP50	COCO mAP50:95	延迟(ms)	FPS (T4)
RF-DETR Nano	-	67.6	48.4	2.32	431
RF-DETR Small	-	72.1	53.0	3.52	284
RF-DETR Medium	-	73.6	54.7	4.52	221
RF-DETR Large	129M	-	56.5	6.8	147
YOLO11 n	2.9M	52.0	37.4	2.49	402
YOLO11 s	10.1M	59.7	44.4	3.16	316
YOLO11 m	22.4M	64.1	48.6	5.13	195
YOLO11 l	27.6M	65.3	50.2	6.65	150
YOLO11 x	62.1M	66.5	51.2	11.92	84

在RF100-VL基准测试上的表现：

RF-DETR在领域适应性方面表现尤为出色，这是因为它基于DINOv2骨干网络，具有强大的泛化能力。

表格

模型	RF100VL mAP50	RF100VL mAP50:95
RF-DETR Nano	84.1	57.1
RF-DETR Small	85.9	59.6
RF-DETR Medium	86.6	60.6
YOLO11 n	81.4	55.3
YOLO11 s	82.3	56.2
YOLO11 m	82.5	56.5

关键发现：

精度优势明显
：RF-DETR Small在mAP50:95上比YOLO11x高出1.8%
速度更快
：在相同精度下，RF-DETR的延迟比YOLO低约20-30%
泛化能力更强
：在RF100-VL基准测试中，RF-DETR的表现远超YOLO系列
帕累托最优
：RF-DETR在精度-速度权衡上达到了帕累托最优

三、环境搭建与安装

3.1 系统要求

硬件要求：

GPU
：NVIDIA GPU（推荐RTX 3060及以上）
- CUDA 11.8 / 12.0 / 12.1
- 显存：推理时至少4GB，训练时建议8GB+
CPU
：多核处理器（推荐Intel i7/i9或AMD Ryzen 7/9）
内存
：至少16GB（推荐32GB）
存储
：至少50GB可用空间（用于数据集和模型）

软件要求：

操作系统
：Ubuntu 20.04+ / Windows 10+ / macOS（需要GPU支持）
Python
：3.9 - 3.11（推荐3.10）
CUDA
：11.8+（推荐12.1）
PyTorch
：2.0+

3.2 安装步骤详解

步骤1：创建虚拟环境

使用Conda（推荐）：

# 创建名为rfdetr的Python 3.10环境
conda create -n rfdetr python=3.10-y

# 激活环境
conda activate rfdetr

使用venv：

# 创建虚拟环境
python -m venv rfdetr

# 激活环境（Linux/macOS）
source rfdetr/bin/activate

# 激活环境（Windows）
rfdetr\Scripts\activate

步骤2：安装PyTorch

根据你的CUDA版本选择对应的PyTorch版本：

CUDA 11.8：

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

CUDA 12.1：

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

CPU版本（不推荐，仅用于测试）：

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

验证PyTorch安装：

python -c"import torch; print(f'PyTorch版本: {torch.__version__}'); print(f'CUDA可用: {torch.cuda.is_available()}')"

步骤3：安装RF-DETR

标准安装：

pip install rfdetr

安装开发版（获取最新功能）：

pip install git+https://github.com/roboflow/rf-detr.git

步骤4：安装依赖工具

# Supervision：用于可视化和数据处理
pip install supervision

# 其他常用工具
pip install opencv-python pillow matplotlib

步骤5：验证安装

创建一个测试文件test_install.py：

import torch
from rfdetr import RFDETRBase

print("="*50)
print("RF-DETR环境检查")
print("="*50)
print(f"PyTorch版本: {torch.__version__}")
print(f"CUDA可用: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA版本: {torch.version.cuda}")
print(f"GPU数量: {torch.cuda.device_count()}")
print(f"当前GPU: {torch.cuda.get_device_name(0)}")

try:
    model = RFDETRBase()
print(f"RF-DETR版本: 成功加载RFDETRBase")
print("✓ RF-DETR安装成功！")
except Exception as e:
print(f"✗ RF-DETR加载失败: {e}")

print("="*50)

运行测试：

python test_install.py

预期输出：

==================================================
RF-DETR环境检查
==================================================
PyTorch版本: 2.1.0+cu121
CUDA可用: True
CUDA版本: 12.1
GPU数量: 1
当前GPU: NVIDIA GeForce RTX 3080
RF-DETR版本: 成功加载RFDETRBase
✓ RF-DETR安装成功！
==================================================

3.3 常见安装问题解决

问题1：pip安装速度慢

# 使用清华镜像源
pip install rfdetr -i https://pypi.tuna.tsinghua.edu.cn/simple

问题2：CUDA版本不匹配

# 检查系统CUDA版本
nvcc --version

# 检查PyTorch CUDA版本
python -c"import torch; print(torch.version.cuda)"

# 如果不匹配，重新安装匹配的PyTorch版本
pip uninstall torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

问题3：Windows上安装失败

# 确保安装了Visual C++ Redistributable
# 下载地址：https://aka.ms/vs/17/release/vc_redist.x64.exe

# 或者使用预编译的wheel文件
pip install rfdetr --no-cache-dir

四、快速上手：模型推理

4.1 图片推理：从零开始

让我们从最简单的图片检测开始，完整演示整个流程。

代码示例：单张图片检测

import io
import requests
import supervision as sv
from PIL import Image
from rfdetr import RFDETRBase
from rfdetr.util.coco_classes import COCO_CLASSES

print("正在初始化RF-DETR模型...")
model = RFDETRBase()
print("模型初始化完成！")

# 方法1：从URL加载图片
url ="https://media.roboflow.com/dog.jpeg"
print(f"正在下载图片: {url}")
image = Image.open(io.BytesIO(requests.get(url).content))

# 方法2：从本地文件加载
# image = Image.open("test.jpg")

print("正在执行推理...")
detections = model.predict(image, threshold=0.5)

print(f"\n检测结果:")
print(f"- 检测到 {len(detections)} 个目标")
print(f"- 置信度: {detections.confidence.tolist()}")
print(f"- 类别ID: {detections.class_id.tolist()}")

# 生成标签
labels =[
f"{COCO_CLASSES[class_id]}{confidence:.2f}"
for class_id, confidence
inzip(detections.class_id, detections.confidence)
]

# 可视化结果
annotated_image = image.copy()

# 绘制边界框
box_annotator = sv.BoxAnnotator()
annotated_image = box_annotator.annotate(annotated_image, detections)

# 绘制标签
label_annotator = sv.LabelAnnotator()
annotated_image = label_annotator.annotate(annotated_image, detections, labels)

# 保存结果
output_path ="result.jpg"
annotated_image.save(output_path)
print(f"\n结果已保存到: {output_path}")

# 显示结果
sv.plot_image(annotated_image)

代码解析：

模型初始化
：RFDETRBase()会自动下载并加载COCO预训练权重
阈值设置
：threshold=0.5表示只保留置信度大于0.5的检测结果
检测结果解析
：

detections.xyxy
：边界框坐标 [x1, y1, x2, y2]
detections.confidence
：置信度分数
detections.class_id
：类别ID（对应COCO 80类）

可视化
：使用supervision库绘制边界框和标签

高级用法：调整推理参数

# 使用不同的置信度阈值
detections = model.predict(image, threshold=0.3)# 更低的阈值，检测更多目标

# 使用自定义分辨率
model_high_res = RFDETRBase(resolution=672)# 更高分辨率，更高精度
detections = model_high_res.predict(image, threshold=0.5)

# 批量推理（多张图片）
images =[Image.open(f"image_{i}.jpg")for i inrange(1,6)]
for img in images:
    detections = model.predict(img, threshold=0.5)
# 处理检测结果

4.2 视频推理：处理视频文件

视频检测是RF-DETR的常见应用场景，下面是一个完整的视频处理示例。

代码示例：视频文件检测

import supervision as sv
from rfdetr import RFDETRBase
from rfdetr.util.coco_classes import COCO_CLASSES

defprocess_video_detection(input_path, output_path, threshold=0.5):
"""
    处理视频文件，进行目标检测

    参数:
        input_path: 输入视频路径
        output_path: 输出视频路径
        threshold: 检测阈值
    """
# 初始化模型
print("初始化RF-DETR模型...")
    model = RFDETRBase()
print("模型初始化完成")

# 定义回调函数
defcallback(frame, index):
# 执行检测
        detections = model.predict(frame, threshold=threshold)

# 生成标签
        labels =[
f"{COCO_CLASSES[class_id]}{confidence:.2f}"
for class_id, confidence
inzip(detections.class_id, detections.confidence)
]

# 可视化
        annotated_frame = frame.copy()
        annotated_frame = sv.BoxAnnotator().annotate(annotated_frame, detections)
        annotated_frame = sv.LabelAnnotator().annotate(annotated_frame, detections, labels)

# 显示进度
if index %30==0:
print(f"已处理 {index} 帧，检测到 {len(detections)} 个目标")

return annotated_frame

# 处理视频
print(f"开始处理视频: {input_path}")
    sv.process_video(
        source_path=input_path,
        target_path=output_path,
        callback=callback
)
print(f"视频处理完成，结果保存到: {output_path}")

# 使用示例
if __name__ =="__main__":
    process_video_detection(
        input_path="input.mp4",
        output_path="output.mp4",
        threshold=0.5
)

高级功能：实时视频流处理

import cv2
import supervision as sv
from rfdetr import RFDETRBase
from rfdetr.util.coco_classes import COCO_CLASSES

classRealTimeVideoDetector:
def__init__(self, threshold=0.5):
        self.model = RFDETRBase()
        self.threshold = threshold
        self.box_annotator = sv.BoxAnnotator()
        self.label_annotator = sv.LabelAnnotator()

defprocess_frame(self, frame):
"""处理单帧"""
        detections = self.model.predict(frame, threshold=self.threshold)

        labels =[
f"{COCO_CLASSES[class_id]}{confidence:.2f}"
for class_id, confidence
inzip(detections.class_id, detections.confidence)
]

        annotated_frame = frame.copy()
        annotated_frame = self.box_annotator.annotate(annotated_frame, detections)
        annotated_frame = self.label_annotator.annotate(annotated_frame, detections, labels)

return annotated_frame

defrun(self, source=0, output_path=None):
"""
        运行实时检测

        参数:
            source: 视频源（0为摄像头，或视频文件路径）
            output_path: 输出视频路径（可选）
        """
        cap = cv2.VideoCapture(source)

# 设置输出
if output_path:
            fourcc = cv2.VideoWriter_fourcc(*'mp4v')
            fps = cap.get(cv2.CAP_PROP_FPS)
            width =int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            height =int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            out = cv2.VideoWriter(output_path, fourcc, fps,(width, height))

print("按 'q' 键退出...")
        frame_count =0

whileTrue:
            ret, frame = cap.read()
ifnot ret:
break

# 处理帧
            annotated_frame = self.process_frame(frame)

# 显示结果
            cv2.imshow("RF-DETR 实时检测", annotated_frame)

# 保存结果
if output_path:
                out.write(annotated_frame)

            frame_count +=1
if frame_count %30==0:
print(f"已处理 {frame_count} 帧")

# 退出条件
if cv2.waitKey(1)&0xFF==ord('q'):
break

        cap.release()
if output_path:
            out.release()
        cv2.destroyAllWindows()
print(f"处理完成，共 {frame_count} 帧")

# 使用示例
if __name__ =="__main__":
    detector = RealTimeVideoDetector(threshold=0.5)

# 摄像头检测
    detector.run(source=0)

# 或处理视频文件
# detector.run(source="input.mp4", output_path="output.mp4")

4.3 摄像头实时检测：边学边做

实时检测是RF-DETR最吸引人的应用之一，下面是一个完整的实现。

代码示例：摄像头实时检测

import cv2
import time
import supervision as sv
from rfdetr import RFDETRBase
from rfdetr.util.coco_classes import COCO_CLASSES

classWebcamDetector:
def__init__(self, model_size='base', threshold=0.5, resolution=None):
"""
        初始化摄像头检测器

        参数:
            model_size: 模型大小 ('nano', 'small', 'base', 'large')
            threshold: 检测阈值
            resolution: 输入分辨率（必须是56的倍数）
        """
print(f"初始化RF-DETR模型（{model_size}）...")

# 根据模型大小选择对应的类
if model_size =='nano':
from rfdetr import RFDETRNano
            self.model = RFDETRNano()
elif model_size =='small':
from rfdetr import RFDETRSmall
            self.model = RFDETRSmall()
elif model_size =='large':
from rfdetr import RFDETRLarge
            self.model = RFDETRLarge()
else:# base
from rfdetr import RFDETRBase
            self.model = RFDETRBase()

if resolution:
from rfdetr import RFDETRBase
            self.model = RFDETRBase(resolution=resolution)

        self.threshold = threshold
        self.box_annotator = sv.BoxAnnotator()
        self.label_annotator = sv.LabelAnnotator()

# 性能统计
        self.frame_count =0
        self.total_time =0

print("模型初始化完成！")

defdetect(self, frame):
"""
        检测单帧

        参数:
            frame: 输入帧（numpy数组）

        返回:
            annotated_frame: 标注后的帧
            detections: 检测结果
            fps: 当前FPS
        """
        start_time = time.time()

# 执行检测
        detections = self.model.predict(frame, threshold=self.threshold)

# 生成标签
        labels =[
f"{COCO_CLASSES[class_id]}{confidence:.2f}"
for class_id, confidence
inzip(detections.class_id, detections.confidence)
]

# 可视化
        annotated_frame = frame.copy()
        annotated_frame = self.box_annotator.annotate(annotated_frame, detections)
        annotated_frame = self.label_annotator.annotate(annotated_frame, detections, labels)

# 计算FPS
        inference_time = time.time()- start_time
        self.total_time += inference_time
        self.frame_count +=1
        fps =1.0/ inference_time if inference_time >0else0

return annotated_frame, detections, fps

defrun(self, camera_id=0, show_fps=True):
"""
        运行实时检测

        参数:
            camera_id: 摄像头ID（默认0）
            show_fps: 是否显示FPS
        """
# 打开摄像头
        cap = cv2.VideoCapture(camera_id)
ifnot cap.isOpened():
print("无法打开摄像头")
return

print("摄像头检测已启动，按 'q' 键退出...")
print("按 's' 键保存当前帧")

whileTrue:
            ret, frame = cap.read()
ifnot ret:
break

# 检测
            annotated_frame, detections, fps = self.detect(frame)

# 显示FPS
if show_fps:
                avg_fps = self.frame_count / self.total_time if self.total_time >0else0
                cv2.putText(annotated_frame,f'FPS: {fps:.1f} (Avg: {avg_fps:.1f})',
(10,30), cv2.FONT_HERSHEY_SIMPLEX,1,(0,255,0),2)

# 显示结果
            cv2.imshow("RF-DETR 实时检测", annotated_frame)

# 键盘控制
            key = cv2.waitKey(1)&0xFF
if key ==ord('q'):# 退出
break
elif key ==ord('s'):# 保存帧
                timestamp = time.strftime("%Y%m%d_%H%M%S")
                cv2.imwrite(f"frame_{timestamp}.jpg", annotated_frame)
print(f"已保存帧: frame_{timestamp}.jpg")

# 清理
        cap.release()
        cv2.destroyAllWindows()

# 统计信息
        avg_fps = self.frame_count / self.total_time if self.total_time >0else0
print(f"\n统计信息:")
print(f"- 总帧数: {self.frame_count}")
print(f"- 平均FPS: {avg_fps:.2f}")
print(f"- 平均推理时间: {self.total_time/self.frame_count*1000:.2f}ms")

# 使用示例
if __name__ =="__main__":
# 创建检测器（使用Small模型平衡速度和精度）
    detector = WebcamDetector(model_size='small', threshold=0.5)

# 运行检测
    detector.run(camera_id=0, show_fps=True)

性能优化建议：

选择合适的模型大小

实时应用：推荐Small或Base
追求速度：使用Nano
追求精度：使用Large

调整分辨率

# 降低分辨率提高速度
model = RFDETRBase(resolution=448)

# 提高分辨率增加精度
model = RFDETRBase(resolution=672)

使用GPU加速

确保PyTorch正确使用GPU：

import torch
print(f"CUDA可用: {torch.cuda.is_available()}")
print(f"当前设备: {torch.cuda.current_device()if torch.cuda.is_available()else'CPU'}")

批量处理

如果需要处理多个摄像头，可以使用多线程：

import threading

defprocess_camera(camera_id):
    detector = WebcamDetector(model_size='small')
    detector.run(camera_id=camera_id)

# 启动多个线程
threads =[]
for i inrange(4):
    t = threading.Thread(target=process_camera, args=(i,))
    t.start()
    threads.append(t)

for t in threads:
    t.join()

五、RF-DETR模型变体详解

5.1 完整模型规格对比

RF-DETR提供了从Nano到2XLarge的多种规格，以满足不同应用场景的需求：

表格

模型	类名	参数量	分辨率	COCO mAP50	COCO mAP50:95	延迟(ms)	FPS (T4)	适用场景
RF-DETR Nano	RFDETRNano	-	512×512	67.6	48.4	2.32	431	移动端、边缘设备
RF-DETR Small	RFDETRSmall	-	560×560	72.1	53.0	3.52	284	实时应用（推荐）
RF-DETR Base	RFDETRBase	29M	576×576	-	53.3	6.0	167	通用场景
RF-DETR Medium	RFDETRMedium	-	576×576	73.6	54.7	4.52	221	高精度实时
RF-DETR Large	RFDETRLarge	129M	704×704	-	56.5	6.8	147	高精度需求
RF-DETR XLarge	RFDETRXLarge	-	700×700	77.4	58.6	11.5	87	研究与生产
RF-DETR 2XLarge	RFDETR2XLarge	-	880×880	78.5	60.1	17.2	58	最高精度

注意：

Nano到Medium模型采用Apache 2.0许可证，完全开源
XLarge和2XLarge模型采用PML 1.0许可证，需要额外授权
延迟测试基于NVIDIA T4 GPU，FP16精度，批量大小=1

5.2 如何选择合适的模型？

选择模型时需要考虑三个因素：精度要求、速度要求、硬件资源。

场景1：移动端和边缘设备

推荐模型：RF-DETR Nano

理由：

最低的显存占用
最快的推理速度（431 FPS）
适合在Jetson Nano、树莓派等设备上运行

应用案例：

移动应用中的物体识别
边缘计算节点
嵌入式系统

from rfdetr import RFDETRNano

model = RFDETRNano()
detections = model.predict(image, threshold=0.5)

场景2：实时视频监控

推荐模型：RF-DETR Small 或 Medium

理由：

平衡的精度和速度
可以处理高分辨率视频流
适合实时监控和预警系统

应用案例：

安防监控系统
交通流量监控
工业流水线检测

from rfdetr import RFDETRSmall

model = RFDETRSmall(resolution=640)# 提高分辨率以获得更好的细节

场景3：高精度离线处理

推荐模型：RF-DETR Large 或 XLarge

理由：

最高的检测精度
适合离线批量处理
可以承担更长的推理时间

应用案例：

医疗影像分析
高精度质量检测
数据集标注辅助

from rfdetr import RFDETRLarge

model = RFDETRLarge(resolution=832)# 使用更高分辨率

场景4：研究与基准测试

推荐模型：RF-DETR 2XLarge

理由：

最强的性能（60.1 mAP）
适合与其他模型对比
可用于算法研究

应用案例：

学术研究
算法基准测试
模型性能分析

from rfdetr import RFDETR2XLarge

# 注意：需要安装rfdetr_plus
# pip install rfdetr[plus]
model = RFDETR2XLarge(resolution=880)

5.3 自定义输入分辨率

RF-DETR的一个强大功能是支持在运行时调整输入分辨率，无需重新训练模型。这让你可以根据实际需求动态平衡精度和速度。

分辨率调整规则：

分辨率必须是56的倍数
推荐分辨率
：392, 448, 504, 560, 616, 672, 728, 784, 840, 896

代码示例：

from rfdetr import RFDETRBase

# 使用默认分辨率（576）
model_default = RFDETRBase()

# 使用低分辨率（更快）
model_fast = RFDETRBase(resolution=448)

# 使用高分辨率（更精确）
model_accurate = RFDETRBase(resolution=672)

# 使用超高分辨率（最精确，但最慢）
model_ultra = RFDETRBase(resolution=784)

分辨率性能对比（RF-DETR Medium）：

表格

分辨率	mAP50:95	延迟(ms)	FPS	内存占用(GB)
448	52.1	2.8	357	2.1
560	54.7	4.5	222	3.2
672	56.2	6.4	156	4.5
784	57.3	8.9	112	6.1

最佳实践：

从默认分辨率开始
：先使用默认分辨率（576）测试性能
逐步调整
：根据需要逐步提高或降低分辨率
监控显存
：确保显存不会溢出
测试实际场景
：在你的真实数据上测试不同分辨率的效果

import torch
from rfdetr import RFDETRBase

deftest_resolution(image, resolution):
"""测试不同分辨率的性能"""
    model = RFDETRBase(resolution=resolution)

# 检查显存占用
if torch.cuda.is_available():
        torch.cuda.empty_cache()
        start_mem = torch.cuda.memory_allocated()

        detections = model.predict(image, threshold=0.5)

        end_mem = torch.cuda.memory_allocated()
        mem_used =(end_mem - start_mem)/1024**3# GB
print(f"分辨率 {resolution}×{resolution}: 显存占用 {mem_used:.2f}GB")
else:
        detections = model.predict(image, threshold=0.5)
print(f"分辨率 {resolution}×{resolution}: CPU模式")

return detections

# 测试多个分辨率
resolutions =[448,560,672,784]
for res in resolutions:
    test_resolution(image, res)

六、自定义数据集训练

6.1 数据集准备完整指南

数据集结构要求

RF-DETR期望数据集采用COCO格式，以下是最新的数据集目录结构：

dataset/
├── train/
│   ├── _annotations.coco.json      # 训练集标注文件
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── image3.png
│   └── ... (其他训练图片)
├── valid/
│   ├── _annotations.coco.json      # 验证集标注文件
│   ├── image1.jpg
│   └── ...
└── test/                           # 可选
    ├── _annotations.coco.json
    ├── image1.jpg
    └── ...

COCO JSON格式详解

COCO格式的标注文件包含三个主要部分：images、annotations和categories。

完整示例：

{
"info":{
"description":"我的自定义数据集",
"version":"1.0",
"year":2025,
"contributor":"你的名字",
"date_created":"2025/03/20"
},
"licenses":[],
"categories":[
{
"id":1,
"name":"person",
"supercategory":"person"
},
{
"id":2,
"name":"car",
"supercategory":"vehicle"
},
{
"id":3,
"name":"dog",
"supercategory":"animal"
}
],
"images":[
{
"id":1,
"file_name":"image1.jpg",
"width":1920,
"height":1080,
"coco_url":"",
"flickr_url":"",
"date_captured":"2025-03-20 00:00:00"
},
{
"id":2,
"file_name":"image2.jpg",
"width":1280,
"height":720,
"coco_url":"",
"flickr_url":"",
"date_captured":"2025-03-20 00:00:00"
}
],
"annotations":[
{
"id":1,
"image_id":1,
"category_id":1,
"bbox":[100,150,200,300],
"area":60000,
"iscrowd":0,
"segmentation":[]
},
{
"id":2,
"image_id":1,
"category_id":2,
"bbox":[800,400,300,250],
"area":75000,
"iscrowd":0,
"segmentation":[]
},
{
"id":3,
"image_id":2,
"category_id":3,
"bbox":[50,50,150,180],
"area":27000,
"iscrowd":0,
"segmentation":[]
}
]
}

字段说明：

**images **: 图片列表
- id
  : 图片唯一标识（必须唯一）
- file_name
  : 文件名
- width
  , height: 图片尺寸
**annotations **: 标注列表
- id
  : 标注唯一标识（必须唯一）
- image_id
  : 对应的图片ID
- category_id
  : 类别ID
- bbox
  : 边界框 [x, y, width, height]（左上角坐标 + 宽高）
- area
  : 边界框面积（width × height）
- iscrowd
  : 是否为人群标注（0=否，1=是）
**categories **: 类别列表
- id
  : 类别唯一标识（必须唯一，从1开始）
- name
  : 类别名称
- supercategory
  : 父类别（可选）

数据准备工具

工具1：使用Python创建COCO格式标注

import json
import os
from PIL import Image

defcreate_coco_annotation(image_dir, output_file, category_mapping):
"""
    将简单标注转换为COCO格式

    参数:
        image_dir: 图片目录
        output_file: 输出JSON文件路径
        category_mapping: 类别映射字典 {'class_name': category_id}
    """

# 初始化COCO结构
    coco_data ={
"info":{
"description":"Custom Dataset",
"version":"1.0",
"year":2025,
"contributor":"",
"date_created":"2025/03/20"
},
"licenses":[],
"categories":[],
"images":[],
"annotations":[]
}

# 添加类别
for class_name, cat_id in category_mapping.items():
        coco_data["categories"].append({
"id": cat_id,
"name": class_name,
"supercategory":"object"
})

    image_id =1
    annotation_id =1

# 遍历图片目录
for filename in os.listdir(image_dir):
ifnot filename.lower().endswith(('.jpg','.jpeg','.png')):
continue

        image_path = os.path.join(image_dir, filename)
        image = Image.open(image_path)
        width, height = image.size

# 添加图片信息
        coco_data["images"].append({
"id": image_id,
"file_name": filename,
"width": width,
"height": height
})

# 这里需要替换为你的实际标注数据
# 示例：假设有一个函数可以读取标注
# annotations = load_annotations(image_path)
        annotations =[]# 替换为实际数据

for ann in annotations:
            coco_data["annotations"].append({
"id": annotation_id,
"image_id": image_id,
"category_id": ann['category_id'],
"bbox": ann['bbox'],# [x, y, width, height]
"area": ann['width']* ann['height'],
"iscrowd":0
})
            annotation_id +=1

        image_id +=1

# 保存JSON文件
withopen(output_file,'w')as f:
        json.dump(coco_data, f, indent=2)

print(f"标注文件已保存到: {output_file}")
print(f"- 图片数量: {len(coco_data['images'])}")
print(f"- 标注数量: {len(coco_data['annotations'])}")

# 使用示例
if __name__ =="__main__":
    category_mapping ={
'person':1,
'car':2,
'dog':3
}

    create_coco_annotation(
        image_dir="./train",
        output_file="./train/_annotations.coco.json",
        category_mapping=category_mapping
)

工具2：使用Roboflow平台

Roboflow提供了一个强大的在线平台来管理和标注数据集：

创建项目

访问 https://roboflow.com
注册并登录
点击"Create New Project"
选择"Object Detection"
输入项目名称

上传图片
- 点击"Upload"
- 选择或拖拽图片
- 支持批量上传
标注数据
- 使用Roboflow Annotate工具
- 支持边界框和多边形标注
- 可以使用自动标注辅助功能
导出数据

点击"Export"
选择"COCO"格式
下载训练集和验证集

6.2 使用Roboflow准备数据集（详细步骤）

步骤1：创建Roboflow账户和项目

访问 https://roboflow.com
注册账户（免费账户足够使用）
创建新的Object Detection项目

步骤2：上传和标注数据

# 使用Roboflow Python SDK
import roboflow

# 初始化
rf = roboflow.Roboflow(api_key="YOUR_API_KEY")

# 获取或创建项目
project = rf.workspace("your-workspace").project("your-project")

# 上传图片
project.upload("path/to/image.jpg")

# 或批量上传
import os
for filename in os.listdir("images"):
if filename.endswith(('.jpg','.png')):
        project.upload(f"images/{filename}")

步骤3：数据增强

在Roboflow平台中，你可以轻松应用各种数据增强技术：

**旋转 **：±15°, ±30°, ±45°
**翻转 **：水平、垂直
**缩放 **：0.5× - 2.0×
**颜色调整 **：亮度、对比度、饱和度
**噪声添加 **：高斯噪声、椒盐噪声
**模糊 **：高斯模糊、运动模糊

步骤4：导出数据集

# 下载COCO格式数据集
dataset = project.version(1).download("coco")

# 数据集会自动下载并解压到当前目录

6.3 训练自定义模型

基础训练示例

from rfdetr import RFDETRBase, RFDETRLarge

# 选择模型大小
model = RFDETRBase()# 或 RFDETRLarge()

# 开始训练
model.train(
    dataset_dir="./dataset",# 数据集根目录
    epochs=50,# 训练轮数
    batch_size=8,# 批次大小
    grad_accum_steps=4,# 梯度累积步数
    lr=1e-4,# 学习率
    output_dir="./output"# 输出目录
)

高级训练配置

from rfdetr import RFDETRBase
import torch

model = RFDETRBase()

# 训练参数详解
training_args ={
# 数据参数
"dataset_dir":"./dataset",

# 训练参数
"epochs":100,# 训练轮数（建议50-100）
"batch_size":4,# 批次大小（根据显存调整）
"grad_accum_steps":8,# 梯度累积（模拟更大batch size）
"lr":1e-4,# 学习率
"weight_decay":0.0001,# 权重衰减

# 输出参数
"output_dir":"./output",

# 其他参数
"save_freq":5,# 保存频率（每N个epoch保存一次）
"eval_freq":5,# 评估频率
"num_workers":4,# 数据加载线程数
"pin_memory":True,# 使用 pinned memory（加速GPU传输）
}

# 开始训练
model.train(** training_args)

训练监控和可视化

import matplotlib.pyplot as plt
import json

defplot_training_history(log_file):
"""绘制训练曲线"""
withopen(log_file,'r')as f:
        logs = json.load(f)

    epochs =[log['epoch']for log in logs]
    train_loss =[log['train_loss']for log in logs]
    val_loss =[log.get('val_loss',0)for log in logs]
    val_map =[log.get('val_map',0)for log in logs]

    fig, axes = plt.subplots(1,3, figsize=(15,4))

# 训练损失
    axes[0].plot(epochs, train_loss, label='Train Loss')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].set_title('Training Loss')
    axes[0].legend()

# 验证损失
    axes[1].plot(epochs, val_loss, label='Val Loss', color='orange')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Loss')
    axes[1].set_title('Validation Loss')
    axes[1].legend()

# 验证mAP
    axes[2].plot(epochs, val_map, label='Val mAP', color='green')
    axes[2].set_xlabel('Epoch')
    axes[2].set_ylabel('mAP')
    axes[2].set_title('Validation mAP')
    axes[2].legend()

    plt.tight_layout()
    plt.savefig('training_history.png')
    plt.show()

# 使用示例
# plot_training_history('./output/training_log.json')

6.4 训练最佳实践

1. 数据准备建议

数据量要求：

简单任务（<10类）：每类500-1000张图片
中等任务（10-50类）：每类1000-2000张图片
复杂任务（>50类）：每类2000+张图片

数据质量：

确保标注准确
保持标注一致性
清理模糊或异常图片
覆盖各种场景和条件

数据分布：

训练集：70-80%
验证集：10-15%
测试集：10-15%

2. 训练策略

阶段1：快速实验（1-10 epochs）

# 使用小模型快速验证
model = RFDETRBase()
model.train(
    dataset_dir="./dataset",
    epochs=5,
    batch_size=8,
    lr=1e-4,
    output_dir="./output_quick"
)

阶段2：完整训练（50-100 epochs）

# 根据快速实验结果调整参数
model = RFDETRBase()
model.train(
    dataset_dir="./dataset",
    epochs=50,
    batch_size=4,
    grad_accum_steps=8,
    lr=1e-4,
    output_dir="./output_full"
)

阶段3：精细调优（100+ epochs）

# 降低学习率进行精细调优
model = RFDETRBase(pretrain_weights="./output_full/checkpoint.pth")
model.train(
    dataset_dir="./dataset",
    epochs=30,
    batch_size=4,
    grad_accum_steps=8,
    lr=1e-5,# 降低学习率
    output_dir="./output_finetune"
)

3. 超参数调优

学习率调优：

# 学习率测试
learning_rates =[1e-5,5e-5,1e-4,5e-4,1e-3]
for lr in learning_rates:
    model.train(
        dataset_dir="./dataset",
        epochs=10,
        batch_size=8,
        lr=lr,
        output_dir=f"./output_lr_{lr}"
)

Batch Size调优：

# 根据显存调整
batch_sizes =[2,4,8,16]
for bs in batch_sizes:
    model.train(
        dataset_dir="./dataset",
        epochs=10,
        batch_size=bs,
        grad_accum_steps=max(1,16//bs),# 保持总batch size为16
        lr=1e-4,
        output_dir=f"./output_bs_{bs}"
)

4. 常见问题解决

问题1：训练过程中显存不足

# 解决方案1：减小batch size
model.train(batch_size=2,...)

# 解决方案2：使用梯度累积
model.train(batch_size=2, grad_accum_steps=8,...)

# 解决方案3：降低输入分辨率
model = RFDETRBase(resolution=448)
model.train(...)

# 解决方案4：使用更小的模型
from rfdetr import RFDETRSmall
model = RFDETRSmall()
model.train(...)

问题2：训练不收敛

# 解决方案1：降低学习率
model.train(lr=1e-5,...)

# 解决方案2：增加训练轮数
model.train(epochs=100,...)

# 解决方案3：检查数据标注
# 确保bbox格式正确：[x, y, width, height]

# 解决方案4：使用数据增强
# 在Roboflow中启用更多增强技术

问题3：过拟合

# 解决方案1：增加训练数据
# 收集更多标注数据

# 解决方案2：使用数据增强
# 应用更多的增强技术

# 解决方案3：添加权重衰减
model.train(weight_decay=0.001,...)

# 解决方案4：早停
# 定期评估验证集，选择最佳模型

6.5 使用微调模型进行推理

训练完成后，你可以使用自定义模型进行推理：

from rfdetr import RFDETRBase
import supervision as sv
from PIL import Image

# 加载自定义权重
model = RFDETRBase(pretrain_weights="./output/checkpoint.pth")

# 进行推理
image = Image.open("test_image.jpg")
detections = model.predict(image, threshold=0.5)

# 可视化
annotated_image = image.copy()
annotated_image = sv.BoxAnnotator().annotate(annotated_image, detections)
annotated_image = sv.LabelAnnotator().annotate(annotated_image, detections)

# 保存结果
annotated_image.save("result_custom.jpg")

批量推理：

import os

# 批量处理图片
input_dir ="./test_images"
output_dir ="./test_results"
os.makedirs(output_dir, exist_ok=True)

for filename in os.listdir(input_dir):
if filename.endswith(('.jpg','.png')):
        image_path = os.path.join(input_dir, filename)
        image = Image.open(image_path)

# 推理
        detections = model.predict(image, threshold=0.5)

# 可视化
        annotated = image.copy()
        annotated = sv.BoxAnnotator().annotate(annotated, detections)
        annotated = sv.LabelAnnotator().annotate(annotated, detections)

# 保存
        output_path = os.path.join(output_dir, filename)
        annotated.save(output_path)

print(f"已处理: {filename}, 检测到 {len(detections)} 个目标")

七、模型导出与部署

7.1 导出ONNX格式

ONNX（Open Neural Network Exchange）是一个开放的格式，可以让模型在不同的框架和平台之间无缝迁移。

基础导出

from rfdetr import RFDETRBase

# 初始化模型
model = RFDETRBase()

# 导出ONNX
model.export()

# 输出文件保存在当前目录的output文件夹中
# 默认文件名：model.onnx

自定义导出

from rfdetr import RFDETRBase
import torch

model = RFDETRBase()

# 导出特定分辨率
model = RFDETRBase(resolution=672)

# 导出ONNX（自定义输出路径）
model.export(output_path="./custom_model.onnx")

# 导出时指定输入尺寸
dummy_input = torch.randn(1,3,576,576)
torch.onnx.export(
    model.model,
    dummy_input,
"custom_model.onnx",
    opset_version=17,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={
'input':{0:'batch_size'},
'output':{0:'batch_size'}
}
)

验证ONNX模型

import onnx
import onnxruntime as ort

# 加载并验证ONNX模型
onnx_model = onnx.load("model.onnx")
onnx.checker.check_model(onnx_model)

print("ONNX模型验证通过！")

# 使用ONNX Runtime推理
session = ort.InferenceSession("model.onnx", providers=['CUDAExecutionProvider','CPUExecutionProvider'])

# 获取输入输出信息
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

print(f"输入名称: {input_name}")
print(f"输入形状: {session.get_inputs()[0].shape}")
print(f"输出名称: {output_name}")

7.2 使用TensorRT优化

TensorRT是NVIDIA提供的高性能深度学习推理优化器，可以显著提升GPU推理速度。

安装TensorRT

# 方法1：使用pip安装
pip install tensorrt

# 方法2：从NVIDIA官网下载
# https://developer.nvidia.com/tensorrt

转换ONNX到TensorRT

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np

defconvert_onnx_to_tensorrt(onnx_path, engine_path, max_batch_size=1):
"""将ONNX模型转换为TensorRT引擎"""

# 创建TensorRT builder
    TRT_LOGGER = trt.Logger(trt.Logger.INFO)
    builder = trt.Builder(TRT_LOGGER)

# 创建网络
    network = builder.create_network(1<<int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, TRT_LOGGER)

# 解析ONNX模型
withopen(onnx_path,'rb')as model:
        parser.parse(model.read())

# 配置builder
    config = builder.create_builder_config()
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE,1<<30)# 1GB

# 启用FP16
    config.set_flag(trt.BuilderFlag.FP16)

# 构建引擎
    engine = builder.build_serialized_network(network, config)

# 保存引擎
withopen(engine_path,'wb')as f:
        f.write(engine)

print(f"TensorRT引擎已保存到: {engine_path}")

# 使用示例
convert_onnx_to_tensorrt(
    onnx_path="model.onnx",
    engine_path="model.trt"
)

使用TensorRT推理

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2

classTensorRTInferencer:
def__init__(self, engine_path):
# 加载引擎
        self.trt_logger = trt.Logger(trt.Logger.INFO)
withopen(engine_path,'rb')as f:
            self.engine = trt.Runtime(self.trt_logger).deserialize_cuda_engine(f.read())

# 创建上下文
        self.context = self.engine.create_execution_context()

# 分配GPU内存
        self.inputs =[]
        self.outputs =[]
        self.bindings =[]
        self.stream = cuda.Stream()

for i inrange(self.engine.num_io_tensors):
            tensor_name = self.engine.get_tensor_name(i)
            dtype = trt.nptype(self.engine.get_tensor_dtype(tensor_name))
            shape = self.context.get_tensor_shape(tensor_name)

            size = trt.volume(shape)
            host_mem = cuda.pagelocked_empty(size, dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)

            self.bindings.append(int(device_mem))

if self.engine.get_tensor_mode(tensor_name)== trt.TensorIOMode.INPUT:
                self.inputs.append({'host': host_mem,'device': device_mem,'name': tensor_name,'shape': shape,'dtype': dtype})
else:
                self.outputs.append({'host': host_mem,'device': device_mem,'name': tensor_name,'shape': shape,'dtype': dtype})

definfer(self, image):
"""执行推理"""
# 预处理图像
        input_tensor = self.preprocess(image)

# 复制输入到GPU
        np.copyto(self.inputs[0]['host'], input_tensor.ravel())
        cuda.memcpy_htod_async(self.inputs[0]['device'], self.inputs[0]['host'], self.stream)

# 执行推理
        self.context.execute_async_v3(stream_handle=self.stream.handle)

# 复制输出到CPU
for output in self.outputs:
            cuda.memcpy_dtoh_async(output['host'], output['device'], self.stream)

        self.stream.synchronize()

# 后处理
return self.postprocess([output['host']for output in self.outputs])

defpreprocess(self, image):
"""预处理"""
# 这里实现你的预处理逻辑
# 例如：resize、normalize等
return image

defpostprocess(self, outputs):
"""后处理"""
# 这里实现你的后处理逻辑
# 例如：解析检测结果、NMS等
return outputs

# 使用示例
inferencer = TensorRTInferencer("model.trt")
image = cv2.imread("test.jpg")
results = inferencer.infer(image)

7.3 部署到不同平台

1. NVIDIA Jetson部署

Jetson系列是NVIDIA的嵌入式计算平台，非常适合边缘计算应用。

# 在Jetson设备上安装依赖
sudoapt-get update
sudoapt-getinstall python3-pip python3-dev

# 安装PyTorch（Jetson版本）
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# 安装RF-DETR
pip install rfdetr supervision opencv-python

# Jetson上的推理脚本
import cv2
from rfdetr import RFDETRSmall
import supervision as sv

# 使用Small模型（适合Jetson）
model = RFDETRSmall(resolution=448)

# 打开摄像头
cap = cv2.VideoCapture(0)

whileTrue:
    ret, frame = cap.read()
ifnot ret:
break

# 推理
    detections = model.predict(frame, threshold=0.5)

# 可视化
    annotated = frame.copy()
    annotated = sv.BoxAnnotator().annotate(annotated, detections)
    annotated = sv.LabelAnnotator().annotate(annotated, detections)

# 显示
    cv2.imshow("Jetson Detection", annotated)

if cv2.waitKey(1)&0xFF==ord('q'):
break

cap.release()
cv2.destroyAllWindows()

2. CPU部署

虽然不推荐在CPU上运行RF-DETR（速度很慢），但在没有GPU的情况下也可以运行。

import torch
from rfdetr import RFDETRBase

# 强制使用CPU
device = torch.device('cpu')

model = RFDETRBase()
model = model.to(device)

# 推理时也会使用CPU
detections = model.predict(image, threshold=0.5)

3. 云端部署

使用FastAPI创建REST API服务：

from fastapi import FastAPI, UploadFile, File
from fastapi.responses import JSONResponse
import io
from PIL import Image
from rfdetr import RFDETRBase
import uvicorn

app = FastAPI(title="RF-DETR API")

# 加载模型
model = RFDETRBase()

@app.post("/detect")
asyncdefdetect(file: UploadFile = File(...)):
"""目标检测API"""
# 读取图片
    contents =awaitfile.read()
    image = Image.open(io.BytesIO(contents))

# 推理
    detections = model.predict(image, threshold=0.5)

# 转换结果为JSON
    results ={
"num_detections":len(detections),
"boxes": detections.xyxy.tolist(),
"classes": detections.class_id.tolist(),
"scores": detections.confidence.tolist()
}

return JSONResponse(content=results)

@app.get("/health")
asyncdefhealth():
"""健康检查"""
return{"status":"ok"}

if __name__ =="__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

运行服务：

python api.py

使用curl测试：

curl-X POST "http://localhost:8000/detect"\
-H"accept: application/json"\
-H"Content-Type: multipart/form-data"\
-F"file=@test.jpg"

4. Docker容器化部署

创建Dockerfile：

FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04

# 安装系统依赖
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    libglib2.0-0 \
    libsm6 \
    libxext6 \
    libxrender-dev \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

# 安装Python依赖
RUN pip3 install --no-cache-dir \
    torch torchvision torchaudio \
    rfdetr \
    supervision \
    opencv-python \
    fastapi uvicorn

# 复制应用代码
COPY app.py /app/app.py

# 工作目录
WORKDIR /app

# 暴露端口
EXPOSE 8000

# 启动服务
CMD ["python3", "app.py"]

构建和运行：

# 构建镜像
docker build -t rfdetr-api .

# 运行容器（需要GPU支持）
docker run --gpus all -p8000:8000 rfdetr-api

八、RF-DETR高级技巧

8.1 批量推理优化

处理大量图片时，批量推理可以显著提高效率。

import os
from PIL import Image
from rfdetr import RFDETRBase
import numpy as np

classBatchInferencer:
def__init__(self, model, batch_size=8):
        self.model = model
        self.batch_size = batch_size

defprocess_folder(self, input_folder, output_folder):
"""批量处理文件夹中的图片"""
        os.makedirs(output_folder, exist_ok=True)

# 获取所有图片
        image_files =[f for f in os.listdir(input_folder)
if f.lower().endswith(('.jpg','.jpeg','.png'))]

print(f"找到 {len(image_files)} 张图片")

# 分批处理
for i inrange(0,len(image_files), self.batch_size):
            batch_files = image_files[i:i+self.batch_size]
            self.process_batch(input_folder, output_folder, batch_files)

print(f"已处理 {min(i+self.batch_size,len(image_files))}/{len(image_files)} 张")

defprocess_batch(self, input_folder, output_folder, batch_files):
"""处理一个批次"""
# 加载图片
        images =[]
for filename in batch_files:
            image_path = os.path.join(input_folder, filename)
            image = Image.open(image_path)
            images.append(image)

# 批量推理
for idx, image inenumerate(images):
            detections = self.model.predict(image, threshold=0.5)

# 可视化
            annotated = image.copy()
            annotated = sv.BoxAnnotator().annotate(annotated, detections)
            annotated = sv.LabelAnnotator().annotate(annotated, detections)

# 保存结果
            output_path = os.path.join(output_folder, batch_files[idx])
            annotated.save(output_path)

# 使用示例
from rfdetr import RFDETRSmall

model = RFDETRSmall()
inferencer = BatchInferencer(model, batch_size=16)

inferencer.process_folder("./input_images","./output_images")

8.2 多尺度检测

结合多个尺度的检测结果可以提高精度，特别是对于不同大小的目标。

import cv2
import numpy as np
from rfdetr import RFDETRBase

classMultiScaleDetector:
def__init__(self, model, scales=[0.8,1.0,1.2]):
        self.model = model
        self.scales = scales

defdetect(self, image):
"""多尺度检测"""
        all_detections =[]

for scale in self.scales:
# 调整图像大小
            height, width = image.shape[:2]
            new_size =(int(width * scale),int(height * scale))
            scaled_image = cv2.resize(image, new_size)

# 推理
            detections = self.model.predict(scaled_image, threshold=0.3)

# 将检测结果转换回原始尺度
iflen(detections.xyxy)>0:
                detections.xyxy = detections.xyxy / scale
                all_detections.append(detections)

# 合并检测结果
iflen(all_detections)>0:
# 这里可以实现更复杂的合并策略
# 简单的NMS合并
            merged = self.merge_detections(all_detections)
return merged

return all_detections[0]iflen(all_detections)>0elseNone

defmerge_detections(self, detections_list):
"""合并多个检测结果"""
# 收集所有检测
        all_boxes =[]
        all_scores =[]
        all_classes =[]

for detections in detections_list:
            all_boxes.extend(detections.xyxy.tolist())
            all_scores.extend(detections.confidence.tolist())
            all_classes.extend(detections.class_id.tolist())

# 使用NMS过滤重复检测
iflen(all_boxes)>0:
import torchvision.ops as ops
            boxes_tensor = torch.tensor(all_boxes, dtype=torch.float32)
            scores_tensor = torch.tensor(all_scores, dtype=torch.float32)

            keep = ops.nms(boxes_tensor, scores_tensor, iou_threshold=0.5)

# 过滤后的结果
            filtered_boxes =[all_boxes[i]for i in keep]
            filtered_scores =[all_scores[i]for i in keep]
            filtered_classes =[all_classes[i]for i in keep]

# 创建Detections对象
from supervision import Detections
            merged_detections = Detections(
                xyxy=np.array(filtered_boxes),
                confidence=np.array(filtered_scores),
                class_id=np.array(filtered_classes)
)

return merged_detections

returnNone

# 使用示例
model = RFDETRBase()
detector = MultiScaleDetector(model, scales=[0.8,1.0,1.2])

image = cv2.imread("test.jpg")
detections = detector.detect(image)

8.3 实时视频流优化

对于实时视频流，使用多线程可以避免阻塞主线程。

import threading
import queue
import time
import cv2
from rfdetr import RFDETRBase

classRealTimeVideoProcessor:
def__init__(self, model, max_queue_size=5):
        self.model = model
        self.frame_queue = queue.Queue(maxsize=max_queue_size)
        self.result_queue = queue.Queue(maxsize=max_queue_size)
        self.running =False
        self.thread =None

defstart(self):
"""启动处理线程"""
        self.running =True
        self.thread = threading.Thread(target=self._process_frames)
        self.thread.start()

defstop(self):
"""停止处理线程"""
        self.running =False
if self.thread:
            self.thread.join()

def_process_frames(self):
"""帧处理线程"""
while self.running:
try:
# 从队列获取帧（带超时）
                frame = self.frame_queue.get(timeout=0.1)

# 推理
                detections = self.model.predict(frame, threshold=0.5)

# 放入结果队列
                self.result_queue.put((frame, detections))

except queue.Empty:
continue
except Exception as e:
print(f"处理错误: {e}")

defadd_frame(self, frame):
"""添加帧到处理队列"""
ifnot self.frame_queue.full():
            self.frame_queue.put(frame)
else:
print("队列已满，丢弃帧")

defget_result(self):
"""获取处理结果"""
ifnot self.result_queue.empty():
return self.result_queue.get()
returnNone

# 使用示例
processor = RealTimeVideoProcessor(RFDETRSmall())
processor.start()

# 模拟视频流
cap = cv2.VideoCapture(0)

whileTrue:
    ret, frame = cap.read()
ifnot ret:
break

# 添加帧到处理队列
    processor.add_frame(frame)

# 获取处理结果
    result = processor.get_result()
if result:
        frame, detections = result
# 可视化
        annotated = sv.BoxAnnotator().annotate(frame, detections)
        annotated = sv.LabelAnnotator().annotate(annotated, detections)
        cv2.imshow("Real-time Detection", annotated)

if cv2.waitKey(1)&0xFF==ord('q'):
break

cap.release()
processor.stop()
cv2.destroyAllWindows()

8.4 模型集成

将RF-DETR与其他模型结合使用可以提升性能。

classEnsembleDetector:
def__init__(self, models, weights=None):
"""
        集成多个检测器

        参数:
            models: 模型列表
            weights: 每个模型的权重（可选）
        """
        self.models = models
        self.weights = weights if weights else[1.0]*len(models)

defdetect(self, image, threshold=0.5):
"""集成检测"""
        all_detections =[]

# 每个模型进行检测
for model in self.models:
            detections = model.predict(image, threshold=threshold)
            all_detections.append(detections)

# 合并结果
return self.merge_detections(all_detections)

defmerge_detections(self, detections_list):
"""合并检测结果（加权平均）"""
# 这里实现合并逻辑
# 可以使用NMS、加权平均等方法
pass

# 使用示例
from rfdetr import RFDETRSmall, RFDETRBase

models =[RFDETRSmall(), RFDETRBase()]
detector = EnsembleDetector(models, weights=[0.3,0.7])

detections = detector.detect(image)

九、常见问题与解决方案

9.1 安装问题

问题1：pip安装失败

症状：

ERROR: Could not find a version that satisfies the requirement rfdetr

解决方案：

# 方案1：升级pip
pip install--upgrade pip

# 方案2：使用国内镜像
pip install rfdetr -i https://pypi.tuna.tsinghua.edu.cn/simple

# 方案3：从源码安装
pip install git+https://github.com/roboflow/rf-detr.git

# 方案4：指定Python版本
python3.10 -m pip install rfdetr

问题2：CUDA版本不匹配

症状：

RuntimeError: CUDA out of memory
或
OSError: libcudart.so.11.0: cannot open shared object file

解决方案：

# 1. 检查系统CUDA版本
nvcc --version

# 2. 检查PyTorch CUDA版本
python -c"import torch; print(torch.version.cuda)"

# 3. 如果不匹配，重新安装匹配的PyTorch

# 卸载旧版本
pip uninstall torch torchvision torchaudio

# 安装匹配的版本（以CUDA 11.8为例）
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

问题3：Windows上安装失败

症状：

error: Microsoft Visual C++ 14.0 or greater is required

解决方案：

# 1. 安装Visual C++ Redistributable
# 下载地址：https://aka.ms/vs/17/release/vc_redist.x64.exe

# 2. 或使用预编译的wheel文件
pip install rfdetr --no-cache-dir

# 3. 或使用conda
conda install-c conda-forge rfdetr

9.2 训练问题

问题1：训练过程中显存不足

症状：

RuntimeError: CUDA out of memory. Tried to allocate XXX MiB

解决方案：

# 方案1：减小batch size
model.train(
    dataset_dir="./dataset",
    epochs=50,
    batch_size=2,# 减小到2或1
    grad_accum_steps=8,# 增加梯度累积
    lr=1e-4,
    output_dir="./output"
)

# 方案2：降低输入分辨率
model = RFDETRBase(resolution=448)# 从默认576降到448

# 方案3：使用更小的模型
from rfdetr import RFDETRSmall, RFDETRNano

model = RFDETRSmall()# 或 RFDETRNano()

# 方案4：清理显存
import torch
torch.cuda.empty_cache()

# 方案5：使用混合精度训练
# (需要修改训练代码，使用torch.cuda.amp)

问题2：模型不收敛

症状：

训练loss不下降，一直很高
或
验证mAP很低

解决方案：

# 方案1：降低学习率
model.train(
    dataset_dir="./dataset",
    epochs=50,
    batch_size=4,
    lr=1e-5,# 从1e-4降到1e-5
    output_dir="./output"
)

# 方案2：检查数据标注
# 确保bbox格式正确：[x, y, width, height]
# 确保category_id正确

# 方案3：增加训练轮数
model.train(
    dataset_dir="./dataset",
    epochs=100,# 从50增加到100
    batch_size=4,
    lr=1e-4,
    output_dir="./output"
)

# 方案4：使用数据增强
# 在Roboflow中启用更多增强技术
# 或使用albumentations库

# 方案5：检查数据集大小
# 数据量太少可能导致不收敛
# 每类至少需要500-1000张图片

问题3：过拟合

症状：

训练loss持续下降，验证loss上升
或
训练集mAP很高，验证集mAP很低

解决方案：

# 方案1：增加训练数据
# 收集更多标注数据

# 方案2：使用数据增强
# 增加增强的强度和多样性

# 方案3：添加权重衰减
model.train(
    dataset_dir="./dataset",
    epochs=50,
    batch_size=4,
    lr=1e-4,
    weight_decay=0.001,# 添加权重衰减
    output_dir="./output"
)

# 方案4：早停
# 定期评估验证集，选择最佳模型
# 保存验证集mAP最高的模型

# 方案5：降低模型复杂度
# 使用更小的模型
from rfdetr import RFDETRSmall
model = RFDETRSmall()

9.3 推理问题

问题1：推理速度慢

症状：

单张图片推理时间超过100ms
或
无法达到实时帧率

解决方案：

# 方案1：使用更小的模型
from rfdetr import RFDETRSmall, RFDETRNano

model = RFDETRSmall()# 或 RFDETRNano()

# 方案2：降低输入分辨率
model = RFDETRBase(resolution=448)# 从默认576降到448

# 方案3：使用TensorRT加速
# 将模型导出为TensorRT引擎

# 方案4：启用FP16精度
# 在推理时使用半精度

# 方案5：批量推理
# 如果有多张图片，使用批量推理

问题2：检测精度低

症状：

漏检很多目标
或
检测框不准确

解决方案：

# 方案1：使用更大的模型
from rfdetr import RFDETRBase, RFDETRLarge

model = RFDETRBase()# 或 RFDETRLarge()

# 方案2：提高输入分辨率
model = RFDETRBase(resolution=672)# 从默认576提高到672

# 方案3：降低置信度阈值
detections = model.predict(image, threshold=0.3)# 从0.5降到0.3

# 方案4：微调模型
# 在自定义数据集上训练模型

# 方案5：使用更多训练数据
# 增加训练数据的数量和多样性

问题3：检测结果不稳定

症状：

同一张图片多次推理结果不同
或
视频中检测框抖动严重

解决方案：

# 方案1：使用确定性推理
import torch
torch.backends.cudnn.deterministic =True
torch.manual_seed(42)

# 方案2：提高置信度阈值
detections = model.predict(image, threshold=0.7)# 从0.5提高到0.7

# 方案3：使用检测结果平滑
# 在视频中使用卡尔曼滤波或匈牙利算法进行跟踪

# 方案4：使用模型集成
# 结合多个模型的检测结果

9.4 部署问题

问题1：ONNX导出失败

症状：

RuntimeError: Unsupported operator or unsupported data type

解决方案：

# 方案1：使用更低的opset版本
import torch
dummy_input = torch.randn(1,3,576,576)
torch.onnx.export(
    model.model,
    dummy_input,
"model.onnx",
    opset_version=14,# 从17降到14
    input_names=['input'],
    output_names=['output']
)

# 方案2：简化模型
# 删除不必要的层或操作

# 方案3：使用ONNX Simplifier
# pip install onnx-sim
# onnxsim model.onnx model_sim.onnx

问题2：TensorRT转换失败

症状：

[TRT] [E] layer validation failed
或
[TRT] [E] ONNX model has unsupported ops

解决方案：

# 方案1：使用更低的opset版本导出ONNX
# 参考上面的方案

# 方案2：禁用某些优化
config = builder.create_builder_config()
config.set_flag(trt.BuilderFlag.DISABLE_TIMING_CACHE)

# 方案3：使用FP16而非INT8
config.set_flag(trt.BuilderFlag.FP16)
# 不要设置 config.set_flag(trt.BuilderFlag.INT8)

# 方案4：使用更新的TensorRT版本
pip install --upgrade tensorrt

十、RF-DETR最佳实践

10.1 数据准备最佳实践

1. 数据质量优先

标注质量：

确保标注准确，边界框紧密贴合目标
保持标注一致性，同一目标在不同图片中标注方式相同
避免遗漏目标和重复标注

数据清洗：

import os
from PIL import Image

defclean_dataset(image_dir, annotation_file, min_size=100):
"""清理数据集，移除异常图片"""

import json
withopen(annotation_file,'r')as f:
        coco_data = json.load(f)

    valid_images =[]
    valid_annotations =[]

for image_info in coco_data['images']:
        image_path = os.path.join(image_dir, image_info['file_name'])

try:
            image = Image.open(image_path)

# 检查图片是否损坏
            image.verify()
            image = Image.open(image_path)# 重新打开

# 检查图片尺寸
            width, height = image.size
if width < min_size or height < min_size:
print(f"图片太小，已跳过: {image_info['file_name']}")
continue

# 检查对应的标注
            image_annotations =[
                ann for ann in coco_data['annotations']
if ann['image_id']== image_info['id']
]

iflen(image_annotations)==0:
print(f"无标注，已跳过: {image_info['file_name']}")
continue

            valid_images.append(image_info)
            valid_annotations.extend(image_annotations)

except Exception as e:
print(f"图片损坏，已跳过: {image_info['file_name']}, 错误: {e}")

# 保存清理后的数据
    cleaned_data ={
'info': coco_data['info'],
'licenses': coco_data['licenses'],
'categories': coco_data['categories'],
'images': valid_images,
'annotations': valid_annotations
}

    output_file = annotation_file.replace('.json','_cleaned.json')
withopen(output_file,'w')as f:
        json.dump(cleaned_data, f, indent=2)

print(f"数据清理完成！")
print(f"- 原始图片: {len(coco_data['images'])}")
print(f"- 清理后图片: {len(valid_images)}")
print(f"- 清理后标注: {len(valid_annotations)}")

# 使用示例
clean_dataset(
    image_dir="./train",
    annotation_file="./train/_annotations.coco.json",
    min_size=100
)

2. 数据多样性

场景覆盖：

不同光照条件：早晨、中午、傍晚、夜间
不同天气：晴天、阴天、雨天、雪天
不同角度：正面、侧面、俯视、仰视
不同背景：复杂背景、简单背景

目标尺度多样性：

defanalyze_object_scales(annotation_file):
"""分析目标尺度分布"""
import json
import numpy as np

withopen(annotation_file,'r')as f:
        coco_data = json.load(f)

    image_sizes ={img['id']:(img['width'], img['height'])
for img in coco_data['images']}

    scales =[]
for ann in coco_data['annotations']:
if ann['image_id']in image_sizes:
            img_width, img_height = image_sizes[ann['image_id']]
            bbox_width, bbox_height = ann['bbox'][2], ann['bbox'][3]

# 计算目标相对于图片的大小
            area_ratio =(bbox_width * bbox_height)/(img_width * img_height)
            scales.append(area_ratio)

# 统计
    scales = np.array(scales)
print(f"目标尺度统计:")
print(f"- 最小: {scales.min():.4f}")
print(f"- 最大: {scales.max():.4f}")
print(f"- 平均: {scales.mean():.4f}")
print(f"- 中位数: {np.median(scales):.4f}")

# 检查是否有尺度缺失
    small_ratio = np.sum(scales <0.01)/len(scales)
    medium_ratio = np.sum((scales >=0.01)&(scales <0.1))/len(scales)
    large_ratio = np.sum(scales >=0.1)/len(scales)

print(f"\n尺度分布:")
print(f"- 小目标 (<1%): {small_ratio:.1%}")
print(f"- 中目标 (1%-10%): {medium_ratio:.1%}")
print(f"- 大目标 (>10%): {large_ratio:.1%}")

# 使用示例
analyze_object_scales("./train/_annotations.coco.json")

3. 数据平衡

类别平衡：

defanalyze_class_balance(annotation_file):
"""分析类别平衡性"""
import json
from collections import Counter

withopen(annotation_file,'r')as f:
        coco_data = json.load(f)

# 统计每个类别的数量
    class_counts = Counter(ann['category_id']for ann in coco_data['annotations'])

# 创建类别名称映射
    category_names ={cat['id']: cat['name']for cat in coco_data['categories']}

# 打印统计
print("类别分布:")
for cat_id insorted(class_counts.keys()):
        name = category_names.get(cat_id,f"Class_{cat_id}")
        count = class_counts[cat_id]
        ratio = count /sum(class_counts.values())
print(f"- {name}: {count} ({ratio:.1%})")

# 计算不平衡度
    counts =list(class_counts.values())
    imbalance_ratio =max(counts)/min(counts)ifmin(counts)>0elsefloat('inf')

print(f"\n不平衡度: {imbalance_ratio:.1f}")
if imbalance_ratio >10:
print("警告：数据集严重不平衡！")
print("建议：使用过采样或欠采样平衡数据")

# 使用示例
analyze_class_balance("./train/_annotations.coco.json")

10.2 训练最佳实践

1. 渐进式训练

classProgressiveTraining:
"""渐进式训练策略"""

def__init__(self, model):
        self.model = model

defstage1_quick_test(self, dataset_dir, output_dir):
"""阶段1：快速测试（验证数据集和配置）"""
print("="*50)
print("阶段1：快速测试")
print("="*50)

        self.model.train(
            dataset_dir=dataset_dir,
            epochs=3,
            batch_size=8,
            lr=1e-4,
            output_dir=f"{output_dir}/stage1"
)

print("阶段1完成，检查训练日志...")

defstage2_base_training(self, dataset_dir, output_dir):
"""阶段2：基础训练"""
print("="*50)
print("阶段2：基础训练")
print("="*50)

        self.model.train(
            dataset_dir=dataset_dir,
            epochs=30,
            batch_size=4,
            grad_accum_steps=4,
            lr=1e-4,
            output_dir=f"{output_dir}/stage2"
)

print("阶段2完成，评估模型性能...")

defstage3_finetuning(self, dataset_dir, output_dir, checkpoint):
"""阶段3：精细调优"""
print("="*50)
print("阶段3：精细调优")
print("="*50)

# 加载阶段2的最佳模型
        self.model = RFDETRBase(pretrain_weights=checkpoint)

        self.model.train(
            dataset_dir=dataset_dir,
            epochs=20,
            batch_size=4,
            grad_accum_steps=4,
            lr=1e-5,# 降低学习率
            output_dir=f"{output_dir}/stage3"
)

print("阶段3完成，最终模型已保存")

# 使用示例
from rfdetr import RFDETRBase

model = RFDETRBase()
trainer = ProgressiveTraining(model)

trainer.stage1_quick_test("./dataset","./output")
trainer.stage2_base_training("./dataset","./output")
trainer.stage3_finetuning("./dataset","./output","./output/stage2/best.pth")

2. 学习率调度

from rfdetr import RFDETRBase
import torch

model = RFDETRBase()

# 自定义训练循环（实现学习率调度）
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=0.0001)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=50, eta_min=1e-6)

# 训练循环
for epoch inrange(50):
# 训练一个epoch
# train_one_epoch(...)

# 更新学习率
    scheduler.step()

# 打印当前学习率
    current_lr = optimizer.param_groups[0]['lr']
print(f"Epoch {epoch+1}, LR: {current_lr:.2e}")

3. 早停机制

classEarlyStopping:
"""早停机制"""

def__init__(self, patience=10, min_delta=0.001):
        self.patience = patience
        self.min_delta = min_delta
        self.counter =0
        self.best_score =None
        self.early_stop =False

def__call__(self, val_score):
"""
        参数:
            val_score: 验证集分数（mAP，越高越好）
        """
if self.best_score isNone:
            self.best_score = val_score
elif val_score < self.best_score + self.min_delta:
            self.counter +=1
print(f"早停计数: {self.counter}/{self.patience}")
if self.counter >= self.patience:
                self.early_stop =True
else:
            self.best_score = val_score
            self.counter =0

return self.early_stop

# 使用示例
early_stopping = EarlyStopping(patience=10, min_delta=0.001)

for epoch inrange(100):
# 训练和验证
    train_loss = train_one_epoch(...)
    val_map = validate(...)

print(f"Epoch {epoch+1}, Train Loss: {train_loss:.4f}, Val mAP: {val_map:.4f}")

# 检查是否早停
if early_stopping(val_map):
print(f"提前停止于Epoch {epoch+1}")
break

10.3 部署最佳实践

1. 模型优化

defoptimize_model(model, output_dir):
"""模型优化流程"""

# 1. 导出ONNX
print("1. 导出ONNX...")
    model.export(output_path=f"{output_dir}/model.onnx")

# 2. 使用ONNX Simplifier简化模型
print("2. 简化ONNX模型...")
import onnxsim
    onnxsim.simplify(f"{output_dir}/model.onnx",f"{output_dir}/model_sim.onnx")

# 3. 转换为TensorRT引擎
print("3. 转换为TensorRT引擎...")
    convert_onnx_to_tensorrt(
        onnx_path=f"{output_dir}/model_sim.onnx",
        engine_path=f"{output_dir}/model.trt"
)

print("模型优化完成！")

2. 服务化部署

from fastapi import FastAPI, UploadFile, File, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
import io
from PIL import Image
import torch
from rfdetr import RFDETRBase
import uvicorn
import time

app = FastAPI(title="RF-DETR Detection Service")

# 添加CORS中间件
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# 加载模型
print("加载模型...")
device = torch.device('cuda'if torch.cuda.is_available()else'cpu')
model = RFDETRBase().to(device)
model.eval()
print("模型加载完成！")

# 统计信息
request_count =0
total_inference_time =0

@app.post("/detect")
asyncdefdetect(file: UploadFile = File(...)):
"""目标检测API"""
global request_count, total_inference_time

try:
# 读取图片
        contents =awaitfile.read()
        image = Image.open(io.BytesIO(contents))

# 推理
        start_time = time.time()
        detections = model.predict(image, threshold=0.5)
        inference_time = time.time()- start_time

# 更新统计
        request_count +=1
        total_inference_time += inference_time

# 转换结果
        results ={
"success":True,
"num_detections":len(detections),
"inference_time":round(inference_time *1000,2),# 毫秒
"detections":[
{
"bbox": box.tolist(),
"class_id":int(class_id),
"score":float(score)
}
for box, class_id, score inzip(
                    detections.xyxy,
                    detections.class_id,
                    detections.confidence
)
]
}

return JSONResponse(content=results)

except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

@app.get("/stats")
asyncdefstats():
"""统计信息"""
    avg_time = total_inference_time / request_count if request_count >0else0
return{
"request_count": request_count,
"average_inference_time_ms":round(avg_time *1000,2),
"average_fps":round(1.0/ avg_time,2)if avg_time >0else0
}

@app.get("/health")
asyncdefhealth():
"""健康检查"""
return{
"status":"ok",
"device":str(device),
"cuda_available": torch.cuda.is_available()
}

if __name__ =="__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000, workers=1)

3. 监控与维护

import logging
from datetime import datetime

classModelMonitor:
"""模型监控"""

def__init__(self, log_file="model_monitor.log"):
        logging.basicConfig(
            filename=log_file,
            level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
        self.logger = logging.getLogger('ModelMonitor')

deflog_inference(self, image_info, detections, inference_time):
"""记录推理信息"""
        self.logger.info(f"Inference - Image: {image_info}, "
f"Objects: {len(detections)}, "
f"Time: {inference_time*1000:.2f}ms")

deflog_error(self, error):
"""记录错误"""
        self.logger.error(f"Error: {str(error)}")

deflog_performance(self, request_count, avg_time):
"""记录性能指标"""
        self.logger.info(f"Performance - Requests: {request_count}, "
f"Avg Time: {avg_time*1000:.2f}ms, "
f"FPS: {1.0/avg_time:.2f}")

# 使用示例
monitor = ModelMonitor()

monitor.log_inference("test.jpg", detections,0.045)
monitor.log_performance(1000,0.042)

十一、RF-DETR应用案例

11.1 工业检测：PCB板缺陷检测

from rfdetr import RFDETRBase
import cv2
import numpy as np
from PIL import Image

classPCBDefectDetector:
"""PCB板缺陷检测器"""

def__init__(self, model_path):
# 加载训练好的PCB检测模型
        self.model = RFDETRBase(pretrain_weights=model_path)

# 缺陷类型映射
        self.defect_types ={
0:"短路",
1:"开路",
2:"缺失元件",
3:"错位元件",
4:"焊锡不足",
5:"焊锡过多"
}

defdetect(self, image_path, threshold=0.7):
"""检测PCB缺陷"""
# 加载图片
        image = cv2.imread(image_path)
        pil_image = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

# 推理
        detections = self.model.predict(pil_image, threshold=threshold)

# 统计缺陷
        defects =[]
for box, class_id, score inzip(
            detections.xyxy,
            detections.class_id,
            detections.confidence
):
            defect_type = self.defect_types.get(class_id,f"未知({class_id})")
            defects.append({
'type': defect_type,
'box': box.tolist(),
'score':float(score)
})

# 可视化
        annotated = image.copy()
for defect in defects:
            box =[int(x)for x in defect['box']]
            cv2.rectangle(annotated,(box[0], box[1]),(box[2], box[3]),(0,0,255),2)

# 添加标签
            label =f"{defect['type']}{defect['score']:.2f}"
            cv2.putText(annotated, label,(box[0], box[1]-10),
                       cv2.FONT_HERSHEY_SIMPLEX,0.5,(0,0,255),2)

# 生成报告
        report ={
'total_defects':len(defects),
'defects': defects,
'image_path': image_path,
'timestamp': datetime.now().isoformat()
}

return report, annotated

defbatch_detect(self, image_dir, output_dir):
"""批量检测"""
import os
        os.makedirs(output_dir, exist_ok=True)

        all_reports =[]

for filename in os.listdir(image_dir):
if filename.endswith(('.jpg','.png')):
                image_path = os.path.join(image_dir, filename)

# 检测
                report, annotated = self.detect(image_path)

# 保存结果
                all_reports.append(report)
                output_path = os.path.join(output_dir,f"result_{filename}")
                cv2.imwrite(output_path, annotated)

print(f"已处理: {filename}, 缺陷数: {report['total_defects']}")

# 生成汇总报告
        summary = self.generate_summary(all_reports)

return summary, all_reports

defgenerate_summary(self, reports):
"""生成汇总报告"""
from collections import Counter

        total_images =len(reports)
        total_defects =sum(r['total_defects']for r in reports)

# 统计各类型缺陷数量
        defect_counts = Counter()
for report in reports:
for defect in report['defects']:
                defect_counts[defect['type']]+=1

        summary ={
'total_images': total_images,
'total_defects': total_defects,
'average_defects_per_image': total_defects / total_images if total_images >0else0,
'defect_distribution':dict(defect_counts)
}

return summary

# 使用示例
detector = PCBDefectDetector("./pcb_model.pth")

# 单张图片检测
report, annotated = detector.detect("pcb_001.jpg")
print(f"检测到 {report['total_defects']} 个缺陷")
cv2.imwrite("result_pcb_001.jpg", annotated)

# 批量检测
summary, reports = detector.batch_detect("./pcb_images","./pcb_results")
print("\n批量检测汇总:")
print(f"- 总图片数: {summary['total_images']}")
print(f"- 总缺陷数: {summary['total_defects']}")
print(f"- 平均每张缺陷数: {summary['average_defects_per_image']:.2f}")
print("- 缺陷分布:")
for defect_type, count in summary['defect_distribution'].items():
print(f"  - {defect_type}: {count}")

11.2 自动驾驶：行人车辆检测

from rfdetr import RFDETRSmall
import cv2
import time
import numpy as np
from rfdetr.util.coco_classes import COCO_CLASSES

classAutonomousDrivingDetector:
"""自动驾驶检测器"""

def__init__(self, model_size='small'):
# 使用Small模型平衡速度和精度
if model_size =='small':
from rfdetr import RFDETRSmall
            self.model = RFDETRSmall()
elif model_size =='base':
from rfdetr import RFDETRBase
            self.model = RFDETRBase()
else:
from rfdetr import RFDETRNano
            self.model = RFDETRNano()

# 关键目标类别
        self.critical_objects ={
'person':1,
'car':3,
'truck':8,
'bus':6,
'motorcycle':4,
'bicycle':2
}

defdetect(self, frame, threshold=0.5):
"""检测关键目标"""
        start_time = time.time()

# 推理
        detections = self.model.predict(frame, threshold=threshold)

# 筛选关键目标
        critical_objects =[]
for box, class_id, score inzip(
            detections.xyxy,
            detections.class_id,
            detections.confidence
):
            class_name = COCO_CLASSES[class_id]
if class_name in self.critical_objects:
                critical_objects.append({
'type': class_name,
'box': box.tolist(),
'score':float(score),
'priority': self.critical_objects[class_name]
})

# 按优先级排序
        critical_objects.sort(key=lambda x: x['priority'], reverse=True)

# 计算FPS
        inference_time = time.time()- start_time
        fps =1.0/ inference_time if inference_time >0else0

return critical_objects, fps

defvisualize(self, frame, objects):
"""可视化检测结果"""
        annotated = frame.copy()

for obj in objects:
            box =[int(x)for x in obj['box']]

# 根据优先级设置颜色
if obj['priority']>=3:
                color =(0,0,255)# 红色：高优先级
elif obj['priority']==2:
                color =(0,255,0)# 绿色：中优先级
else:
                color =(255,0,0)# 蓝色：低优先级

# 绘制边界框
            cv2.rectangle(annotated,(box[0], box[1]),(box[2], box[3]), color,2)

# 添加标签
            label =f"{obj['type']}{obj['score']:.2f}"
            cv2.putText(annotated, label,(box[0], box[1]-10),
                       cv2.FONT_HERSHEY_SIMPLEX,0.5, color,2)

return annotated

defanalyze_scene(self, objects, frame_shape):
"""分析场景"""
        width, height = frame_shape[1], frame_shape[0]

# 划分区域
        regions ={
'near':0,# 近距离（底部1/3）
'middle':0,# 中距离（中间1/3）
'far':0# 远距离（顶部1/3）
}

for obj in objects:
            box = obj['box']
            center_y =(box[1]+ box[3])/2

if center_y > height *2/3:
                regions['near']+=1
elif center_y > height /3:
                regions['middle']+=1
else:
                regions['far']+=1

return regions

defrun_realtime(self, camera_id=0, output_video=None):
"""运行实时检测"""
        cap = cv2.VideoCapture(camera_id)

if output_video:
            fourcc = cv2.VideoWriter_fourcc(*'mp4v')
            fps = cap.get(cv2.CAP_PROP_FPS)
            width =int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            height =int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            out = cv2.VideoWriter(output_video, fourcc, fps,(width, height))

print("实时检测已启动，按 'q' 键退出...")
print("按 's' 键保存当前帧")

        frame_count =0
        total_fps =0

whileTrue:
            ret, frame = cap.read()
ifnot ret:
break

# 检测
            objects, fps = self.detect(frame)

# 分析场景
            regions = self.analyze_scene(objects, frame.shape)

# 可视化
            annotated = self.visualize(frame, objects)

# 显示统计信息
            info_text =[
f"FPS: {fps:.1f}",
f"Objects: {len(objects)}",
f"Near: {regions['near']}",
f"Middle: {regions['middle']}",
f"Far: {regions['far']}"
]

for i, text inenumerate(info_text):
                cv2.putText(annotated, text,(10,30+ i*30),
                           cv2.FONT_HERSHEY_SIMPLEX,0.7,(0,255,0),2)

# 显示结果
            cv2.imshow("Autonomous Driving Detection", annotated)

# 保存视频
if output_video:
                out.write(annotated)

# 统计
            frame_count +=1
            total_fps += fps
if frame_count %30==0:
                avg_fps = total_fps / frame_count
print(f"已处理 {frame_count} 帧，平均FPS: {avg_fps:.1f}")

# 键盘控制
            key = cv2.waitKey(1)&0xFF
if key ==ord('q'):# 退出
break
elif key ==ord('s'):# 保存帧
                timestamp = time.strftime("%Y%m%d_%H%M%S")
                cv2.imwrite(f"frame_{timestamp}.jpg", annotated)
print(f"已保存帧: frame_{timestamp}.jpg")

        cap.release()
if output_video:
            out.release()
        cv2.destroyAllWindows()

# 最终统计
        avg_fps = total_fps / frame_count if frame_count >0else0
print(f"\n统计信息:")
print(f"- 总帧数: {frame_count}")
print(f"- 平均FPS: {avg_fps:.2f}")

# 使用示例
detector = AutonomousDrivingDetector(model_size='small')

# 运行实时检测
detector.run_realtime(camera_id=0, output_video="driving_detection.mp4")

11.3 智能安防：入侵检测

from rfdetr import RFDETRBase
import cv2
import numpy as np
from rfdetr.util.coco_classes import COCO_CLASSES

classIntrusionDetector:
"""入侵检测器"""

def__init__(self, model_path=None, alert_zone=None):
# 加载模型
if model_path:
            self.model = RFDETRBase(pretrain_weights=model_path)
else:
            self.model = RFDETRBase()

# 设置警戒区域
        self.alert_zone = alert_zone if alert_zone else[
(100,100),(500,100),(500,400),(100,400)
]

# 历史记录
        self.detection_history =[]

defset_alert_zone(self, points):
"""设置警戒区域"""
        self.alert_zone = points

defis_point_in_polygon(self, point, polygon):
"""判断点是否在多边形内"""
        x, y = point
        n =len(polygon)
        inside =False

        p1x, p1y = polygon[0]
for i inrange(n +1):
            p2x, p2y = polygon[i % n]
if y >min(p1y, p2y):
if y <=max(p1y, p2y):
if x <=max(p1x, p2x):
if p1y != p2y:
                            xinters =(y - p1y)*(p2x - p1x)/(p2y - p1y)+ p1x
if p1x == p2x or x <= xinters:
                            inside =not inside
            p1x, p1y = p2x, p2y

return inside

defdetect_intrusion(self, frame, threshold=0.6):
"""检测入侵"""
# 推理
        detections = self.model.predict(frame, threshold=threshold)

# 检查是否有目标进入警戒区域
        intrusions =[]

for box, class_id, score inzip(
            detections.xyxy,
            detections.class_id,
            detections.confidence
):
# 计算目标中心点
            center_x =(box[0]+ box[2])/2
            center_y =(box[1]+ box[3])/2

# 检查是否在警戒区域内
if self.is_point_in_polygon((center_x, center_y), self.alert_zone):
                intrusions.append({
'type': COCO_CLASSES[class_id],
'box': box.tolist(),
'score':float(score),
'center':(float(center_x),float(center_y))
})

return intrusions

defvisualize(self, frame, intrusions):
"""可视化检测结果"""
        annotated = frame.copy()

# 绘制警戒区域
        pts = np.array(self.alert_zone, np.int32)
        cv2.polylines(annotated,[pts],True,(0,0,255),2)

# 填充警戒区域
        cv2.fillPoly(annotated,[pts],(0,0,50))

# 绘制入侵目标
for intrusion in intrusions:
            box =[int(x)for x in intrusion['box']]

# 绘制边界框
            cv2.rectangle(annotated,(box[0], box[1]),(box[2], box[3]),(0,255,0),2)

# 添加标签
            label =f"{intrusion['type']}{intrusion['score']:.2f}"
            cv2.putText(annotated, label,(box[0], box[1]-10),
                       cv2.FONT_HERSHEY_SIMPLEX,0.5,(0,255,0),2)

# 添加警告信息
iflen(intrusions)>0:
            warning_text =f"WARNING: {len(intrusions)} intrusion(s) detected!"
            cv2.putText(annotated, warning_text,(50,50),
                       cv2.FONT_HERSHEY_SIMPLEX,1.0,(0,0,255),3)

return annotated

defrun_monitoring(self, video_source=0, output_video=None, alert_callback=None):
"""运行监控"""
        cap = cv2.VideoCapture(video_source)

if output_video:
            fourcc = cv2.VideoWriter_fourcc(*'mp4v')
            fps = cap.get(cv2.CAP_PROP_FPS)
            width =int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            height =int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            out = cv2.VideoWriter(output_video, fourcc, fps,(width, height))

print("入侵检测监控已启动，按 'q' 键退出...")
print("按 'r' 键重置警戒区域")

        frame_count =0
        intrusion_count =0

whileTrue:
            ret, frame = cap.read()
ifnot ret:
break

# 检测入侵
            intrusions = self.detect_intrusion(frame)

# 触发警报
iflen(intrusions)>0:
                intrusion_count +=1
if alert_callback:
                    alert_callback(intrusions, frame_count)

# 可视化
            annotated = self.visualize(frame, intrusions)

# 显示统计信息
            info_text =[
f"Frame: {frame_count}",
f"Intrusions: {intrusion_count}",
f"Current: {len(intrusions)}"
]

for i, text inenumerate(info_text):
                cv2.putText(annotated, text,(10,30+ i*30),
                           cv2.FONT_HERSHEY_SIMPLEX,0.7,(0,255,0),2)

# 显示结果
            cv2.imshow("Intrusion Detection", annotated)

# 保存视频
if output_video:
                out.write(annotated)

            frame_count +=1

# 键盘控制
            key = cv2.waitKey(1)&0xFF
if key ==ord('q'):# 退出
break
elif key ==ord('r'):# 重置警戒区域
                self.reset_alert_zone(annotated)

        cap.release()
if output_video:
            out.release()
        cv2.destroyAllWindows()

print(f"\n监控结束:")
print(f"- 总帧数: {frame_count}")
print(f"- 入侵次数: {intrusion_count}")

defreset_alert_zone(self, frame):
"""交互式设置警戒区域"""
print("\n点击4个点设置警戒区域...")

        points =[]

defmouse_callback(event, x, y, flags, param):
if event == cv2.EVENT_LBUTTONDOWN:
                points.append((x, y))
                cv2.circle(frame,(x, y),5,(0,255,0),-1)
                cv2.putText(frame,str(len(points)),(x-5, y-10),
                           cv2.FONT_HERSHEY_SIMPLEX,0.5,(0,255,0),2)
                cv2.imshow("Set Alert Zone", frame)

iflen(points)==4:
                    cv2.polylines(frame,[np.array(points)],True,(0,0,255),2)
                    cv2.imshow("Set Alert Zone", frame)
print(f"警戒区域已设置: {points}")

        cv2.namedWindow("Set Alert Zone")
        cv2.setMouseCallback("Set Alert Zone", mouse_callback)
        cv2.imshow("Set Alert Zone", frame)

whilelen(points)<4:
            key = cv2.waitKey(1)&0xFF
if key ==ord('q'):
break

        cv2.destroyWindow("Set Alert Zone")

iflen(points)==4:
            self.alert_zone = points
print("警戒区域设置完成！")

defalert_callback(intrusions, frame_count):
"""警报回调函数"""
print(f"[ALERT] Frame {frame_count}: 检测到 {len(intrusions)} 个入侵目标！")
for intrusion in intrusions:
print(f"  - {intrusion['type']} (置信度: {intrusion['score']:.2f})")

# 使用示例
detector = IntrusionDetector()

# 运行监控
detector.run_monitoring(
    video_source=0,# 摄像头
    output_video="intrusion_detection.mp4",
    alert_callback=alert_callback
)

官方资源：

GitHub仓库：https://github.com/roboflow/rf-detr
Roboflow平台：https://roboflow.com
技术文档：https://roboflow.com/docs

相关论文：

RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Deformable DETR: Deformable Transformers for End-to-End Object Detection
DINOv2: Learning Robust Visual Features Without Supervision
DETR: End-to-End Object Detection with Transformers

Q1: RF-DETR与YOLO的主要区别是什么？

A: 主要区别包括：

架构：RF-DETR基于Transformer，YOLO基于CNN
后处理：RF-DETR不需要NMS，YOLO需要NMS
训练：RF-DETR端到端训练，YOLO需要多阶段训练
泛化：RF-DETR泛化能力更强，适应不同领域
性能：在相同精度下，RF-DETR速度更快

Q2: 训练需要多少数据？

A: 根据任务复杂度：

简单任务（<10类）：每类500-1000张图片
中等任务（10-50类）：每类1000-2000张图片
复杂任务（>50类）：每类2000+张图片

Q3: 如何选择合适的模型大小？

A: 根据应用场景：

移动端/边缘设备：Nano
实时应用：Small或Base
高精度需求：Large
研究用途：XLarge或2XLarge

Q4: 能否在CPU上运行？

A: 可以，但速度会很慢。建议使用GPU加速推理。

Q5: 如何提高检测精度？

A: 可以尝试：

使用更大的模型
增加训练数据
提高输入分辨率
使用数据增强
调整超参数

Q6: 训练时间需要多久？

A: 取决于数据量和硬件：

小数据集（<1000张）：2-4小时
中等数据集（1000-10000张）：1-3天
大数据集（>10000张）：3-7天

Q7: 模型文件有多大？

A: 不同模型大小：

Nano: ~20MB
Small: ~40MB
Base: ~100MB
Large: ~500MB

Q8: 如何部署到生产环境？

A: 常用方法：

导出ONNX格式
使用TensorRT优化
搭建REST API服务
Docker容器化
使用Kubernetes部署

12.5 未来展望

RF-DETR代表了目标检测技术的最新进展，但这个领域仍在快速发展。未来可能的发展方向包括：

1. 性能持续提升

更高的检测精度
更快的推理速度
更小的模型体积

2. 多模态融合

结合视觉和语言信息
支持开放词汇检测
增强零样本能力

3. 端到端优化

简化训练流程
自动化超参数调优
降低部署门槛

4. 新兴应用场景

医疗影像诊断
自动驾驶
机器人视觉
AR/VR应用

5. 边缘AI发展

更高效的边缘部署
低功耗设计
实时性能提升

附录A：快速参考

A.1 常用命令

# 安装
pip install rfdetr

# 推理
python -c"from rfdetr import RFDETRBase; model = RFDETRBase(); detections = model.predict('image.jpg')"

# 训练
python -c"from rfdetr import RFDETRBase; model = RFDETRBase(); model.train(dataset_dir='./dataset', epochs=50, batch_size=4, lr=1e-4, output_dir='./output')"

# 导出ONNX
python -c"from rfdetr import RFDETRBase; model = RFDETRBase(); model.export()"