手把手教你:基于 COCO 123,272 张图片的 Faster R-CNN 目标检测实战
训练时建议使用至少 4 块 GPU 以加速过程。若显存不足,可减小 batch_size 或使用梯度累积技术。COCO 数据集训练完整的 Faster R-CNN 通常需要 12-24 小时(使用 4x V100 GPU)。确保已安装 Python 3.6+ 和 PyTorch 1.7+。
·
环境准备
确保已安装 Python 3.6+ 和 PyTorch 1.7+。推荐使用 Anaconda 管理环境:
conda create -n fasterrcnn python=3.8
conda activate fasterrcnn
pip install torch torchvision torchaudio
pip install pycocotools opencv-python
数据集下载与预处理
从 COCO 官网下载数据集(2017 版):
- 训练集:http://images.cocodataset.org/zips/train2017.zip
- 验证集:http://images.cocodataset.org/zips/val2017.zip
- 标注文件:http://images.cocodataset.org/annotations/annotations_trainval2017.zip
解压后目录结构应如下:
coco/
├── annotations/
│ ├── instances_train2017.json
│ └── instances_val2017.json
├── train2017/
└── val2017/
模型构建
使用 Torchvision 预定义的 Faster R-CNN 模型:
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
def get_model(num_classes):
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
return model
# COCO 有 80 个类别 + 背景
model = get_model(81)
数据加载器实现
创建自定义数据集类处理 COCO 格式:
from torch.utils.data import Dataset
import cv2
import os
class CocoDataset(Dataset):
def __init__(self, root, annotation, transforms=None):
self.root = root
self.transforms = transforms
self.coco = COCO(annotation)
self.ids = list(sorted(self.coco.imgs.keys()))
def __getitem__(self, idx):
img_id = self.ids[idx]
ann_ids = self.coco.getAnnIds(imgIds=img_id)
annotations = self.coco.loadAnns(ann_ids)
img_info = self.coco.loadImgs(img_id)[0]
img_path = os.path.join(self.root, img_info['file_name'])
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
boxes = []
labels = []
for ann in annotations:
xmin, ymin, w, h = ann['bbox']
boxes.append([xmin, ymin, xmin + w, ymin + h])
labels.append(ann['category_id'])
target = {
'boxes': torch.as_tensor(boxes, dtype=torch.float32),
'labels': torch.as_tensor(labels, dtype=torch.int64),
'image_id': torch.tensor([img_id])
}
if self.transforms:
img = self.transforms(img)
return img, target
训练流程
配置训练参数并启动训练:
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
params = [p for p in model.parameters() if p.requires_grad]
optimizer = optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
lr_scheduler = StepLR(optimizer, step_size=3, gamma=0.1)
dataset = CocoDataset('coco/train2017', 'coco/annotations/instances_train2017.json')
data_loader = torch.utils.data.DataLoader(dataset, batch_size=2, shuffle=True, collate_fn=lambda x: tuple(zip(*x)))
for epoch in range(10):
model.train()
for images, targets in data_loader:
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
optimizer.zero_grad()
losses.backward()
optimizer.step()
lr_scheduler.step()
模型评估
使用 COCO 官方评估指标:
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
model.eval()
coco_gt = COCO('coco/annotations/instances_val2017.json')
coco_dt = []
with torch.no_grad():
for img_id in coco_gt.getImgIds()[:100]: # 评估前100张
img_info = coco_gt.loadImgs(img_id)[0]
img_path = f"coco/val2017/{img_info['file_name']}"
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_tensor = torch.from_numpy(img/255.).permute(2,0,1).float().to(device)
pred = model([img_tensor])
boxes = pred[0]['boxes'].cpu().numpy()
scores = pred[0]['scores'].cpu().numpy()
labels = pred[0]['labels'].cpu().numpy()
for i in range(len(boxes)):
coco_dt.append({
'image_id': img_id,
'category_id': int(labels[i]),
'bbox': [float(x) for x in boxes[i]],
'score': float(scores[i])
})
coco_dt = coco_gt.loadRes(coco_dt)
coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()
关键注意事项
训练时建议使用至少 4 块 GPU 以加速过程。若显存不足,可减小 batch_size 或使用梯度累积技术。COCO 数据集训练完整的 Faster R-CNN 通常需要 12-24 小时(使用 4x V100 GPU)。
更多推荐


所有评论(0)