突破性能瓶颈：video-subtitle-extractor分布式处理全指南

你是否还在为4K电影字幕提取等待数小时？面对批量处理上百个教学视频时是否感到力不从心？本文将带你构建基于多台电脑协同工作的字幕提取集群，通过分布式任务调度将处理效率提升3-10倍，彻底解决单机算力不足的痛点。读完本文你将获得：- 3种分布式架构的部署实施方案- 跨设备任务分配的核心算法实现- 动态负载均衡的5个关键参数调优- 150行实战代码构建分布式控制节点- 常见故障排查的8个诊...

劳婵绚Shirley

372人浏览 · 2025-09-10 02:08:41

劳婵绚Shirley · 2025-09-10 02:08:41 发布

突破性能瓶颈：video-subtitle-extractor分布式处理全指南

【免费下载链接】video-subtitle-extractor 视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files. 项目地址: https://gitcode.com/gh_mirrors/vi/video-subtitle-extractor

你是否还在为4K电影字幕提取等待数小时？面对批量处理上百个教学视频时是否感到力不从心？本文将带你构建基于多台电脑协同工作的字幕提取集群，通过分布式任务调度将处理效率提升3-10倍，彻底解决单机算力不足的痛点。

读完本文你将获得：

3种分布式架构的部署实施方案
跨设备任务分配的核心算法实现
动态负载均衡的5个关键参数调优
150行实战代码构建分布式控制节点
常见故障排查的8个诊断工具

分布式架构设计：从理论到实践

字幕提取的性能瓶颈分析

视频硬字幕提取包含两大计算密集型步骤，在单机环境下存在明显性能瓶颈：

mermaid

表：不同分辨率视频的单机处理耗时

视频规格	10分钟样本	90分钟电影	10小时课程
720p	8分钟	1.2小时	12小时
1080p	15分钟	2.5小时	25小时
4K	42分钟	6.8小时	72小时

分布式系统架构选型

针对视频字幕提取的任务特性，推荐以下三种架构方案：

1. 主从式架构（推荐入门）

mermaid

核心特点：

控制节点负责任务切分与结果合并
从节点仅执行OCR识别任务
适合3-10台设备的小型集群
实现复杂度低，部署难度小

2. 对等网络架构（推荐企业级）

mermaid

核心特点：

无中心节点，任意节点可发起任务
基于DHT协议实现节点发现
支持动态扩缩容，单个节点故障不影响整体
适合10台以上设备的大规模集群

3. 混合架构（推荐生产环境）

mermaid

核心特点：

中央控制器管理元数据与任务调度
工作节点按功能分组（检测组/识别组/后处理组）
支持GPU/CPU节点混合部署
内置故障转移与任务重试机制

集群部署实战：从0到1构建分布式系统

环境准备与依赖配置

控制节点最低配置：

CPU: 4核8线程
内存: 16GB
存储: 100GB SSD
操作系统: Ubuntu 20.04 LTS

从节点最低配置：

CPU: 双核4线程 / GPU: NVIDIA GTX 1050Ti
内存: 8GB
存储: 50GB
操作系统: Windows 10/11 或 Ubuntu 20.04

统一依赖环境：

# 所有节点必须安装的基础依赖
pip install paddleocr==2.6.0.3 opencv-python==4.5.5.64 
pip install numpy==1.21.6 scipy==1.7.3 shapely==2.0.1
pip install tqdm==4.64.0 python-multipart==0.0.6

# 控制节点额外依赖
pip install fastapi==0.95.0 uvicorn==0.21.1 pydantic==1.10.7

网络配置与安全策略

端口规划：

控制节点API端口: 8000
节点通信端口: 50051
文件传输端口: 22 (SSH) / 445 (SMB)

防火墙配置示例：

# 控制节点防火墙设置
sudo ufw allow 8000/tcp
sudo ufw allow 50051/tcp
sudo ufw allow from 192.168.1.0/24 to any port 22

# 从节点防火墙设置
sudo ufw allow 50051/tcp
sudo ufw allow from 192.168.1.100 to any port 22

控制节点实现（Python）

# distributed/master_node.py
import os
import time
import json
import socket
import hashlib
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
from typing import List, Dict, Tuple
import uvicorn
from datetime import datetime

app = FastAPI(title="Subtitle Extractor Cluster")

# 集群状态存储
cluster_state = {
    "nodes": {},  # 节点状态
    "tasks": {},  # 任务队列
    "completed": {},  # 完成任务
    "failed": {}  # 失败任务
}

# 节点注册模型
class NodeRegistration(BaseModel):
    node_id: str
    cpu_cores: int
    gpu_available: bool
    memory_gb: float
    ip_address: str
    port: int

# 任务分配模型
class TaskAssignment(BaseModel):
    task_id: str
    video_path: str
    start_frame: int
    end_frame: int
    language: str = "ch"
    model_version: str = "V4"

# 节点心跳模型
class NodeHeartbeat(BaseModel):
    node_id: str
    current_task: str = ""
    cpu_usage: float
    memory_usage: float
    progress: float = 0.0

@app.post("/register")
async def register_node(node: NodeRegistration):
    """注册新节点到集群"""
    cluster_state["nodes"][node.node_id] = {
        "status": "idle",
        "resources": {
            "cpu_cores": node.cpu_cores,
            "gpu_available": node.gpu_available,
            "memory_gb": node.memory_gb
        },
        "connection": {
            "ip": node.ip_address,
            "port": node.port,
            "last_seen": datetime.now().timestamp()
        },
        "performance": {
            "avg_frame_rate": 0.0,
            "success_rate": 1.0
        }
    }
    return {"status": "success", "node_id": node.node_id}

@app.post("/heartbeat")
async def update_heartbeat(heartbeat: NodeHeartbeat):
    """更新节点心跳状态"""
    if heartbeat.node_id not in cluster_state["nodes"]:
        return {"status": "error", "message": "Node not registered"}
    
    # 更新节点状态
    cluster_state["nodes"][heartbeat.node_id]["connection"]["last_seen"] = datetime.now().timestamp()
    cluster_state["nodes"][heartbeat.node_id]["status"] = "busy" if heartbeat.current_task else "idle"
    cluster_state["nodes"][heartbeat.node_id]["current_task"] = heartbeat.current_task
    cluster_state["nodes"][heartbeat.node_id]["resource_usage"] = {
        "cpu": heartbeat.cpu_usage,
        "memory": heartbeat.memory_usage
    }
    
    # 更新任务进度
    if heartbeat.current_task and heartbeat.current_task in cluster_state["tasks"]:
        cluster_state["tasks"][heartbeat.current_task]["progress"] = heartbeat.progress
        if heartbeat.progress >= 1.0:
            cluster_state["tasks"][heartbeat.current_task]["status"] = "completed"
            cluster_state["completed"][heartbeat.current_task] = cluster_state["tasks"].pop(heartbeat.current_task)
    
    return {"status": "success"}

def split_video_into_chunks(video_path: str, num_chunks: int) -> List[Tuple[int, int]]:
    """将视频分割为多个处理块"""
    # 实际实现应使用OpenCV获取总帧数
    # 此处为简化示例，假设每1000帧为一个块
    total_frames = 10000  # 示例值
    chunk_size = total_frames // num_chunks
    chunks = []
    
    for i in range(num_chunks):
        start = i * chunk_size
        end = start + chunk_size if i < num_chunks -1 else total_frames
        chunks.append((start, end))
    
    return chunks

@app.post("/distribute_task")
async def distribute_task(video_path: str, background_tasks: BackgroundTasks):
    """分配视频处理任务到集群节点"""
    # 1. 验证视频文件
    if not os.path.exists(video_path):
        return {"status": "error", "message": "Video file not found"}
    
    # 2. 生成唯一任务ID
    task_id = hashlib.md5(f"{video_path}_{time.time()}".encode()).hexdigest()[:16]
    
    # 3. 获取可用节点
    available_nodes = [
        node_id for node_id, node_info in cluster_state["nodes"].items()
        if node_info["status"] == "idle" and 
        (datetime.now().timestamp() - node_info["connection"]["last_seen"] < 30)
    ]
    
    if not available_nodes:
        return {"status": "error", "message": "No available nodes in cluster"}
    
    # 4. 分割任务
    num_chunks = len(available_nodes)
    frame_chunks = split_video_into_chunks(video_path, num_chunks)
    
    # 5. 创建任务记录
    cluster_state["tasks"][task_id] = {
        "video_path": video_path,
        "total_chunks": num_chunks,
        "completed_chunks": 0,
        "chunks": {},
        "status": "distributed",
        "created_at": datetime.now().timestamp(),
        "progress": 0.0
    }
    
    # 6. 分配任务到节点
    for i, node_id in enumerate(available_nodes):
        chunk_id = f"{task_id}_chunk_{i}"
        start_frame, end_frame = frame_chunks[i]
        
        # 创建块任务
        cluster_state["tasks"][task_id]["chunks"][chunk_id] = {
            "node_id": node_id,
            "start_frame": start_frame,
            "end_frame": end_frame,
            "status": "assigned",
            "progress": 0.0
        }
        
        # 异步发送任务到节点
        background_tasks.add_task(
            send_task_to_node,
            node_id=node_id,
            task_id=chunk_id,
            video_path=video_path,
            start_frame=start_frame,
            end_frame=end_frame
        )
    
    return {"status": "success", "task_id": task_id, "chunks": num_chunks}

async def send_task_to_node(node_id: str, task_id: str, video_path: str, start_frame: int, end_frame: int):
    """发送任务到指定节点"""
    node_info = cluster_state["nodes"][node_id]
    node_ip = node_info["connection"]["ip"]
    node_port = node_info["connection"]["port"]
    
    # 创建任务分配对象
    task = TaskAssignment(
        task_id=task_id,
        video_path=video_path,
        start_frame=start_frame,
        end_frame=end_frame
    )
    
    # 实际实现应使用HTTP或gRPC发送任务到节点
    # 此处简化处理
    try:
        # 模拟网络请求
        time.sleep(0.5)
        
        # 更新节点状态
        cluster_state["nodes"][node_id]["status"] = "busy"
        cluster_state["nodes"][node_id]["current_task"] = task_id
        
        # 更新任务状态
        cluster_state["tasks"][task_id.split("_chunk_")[0]]["chunks"][task_id]["status"] = "processing"
        
        return True
    except Exception as e:
        print(f"Failed to send task to node {node_id}: {str(e)}")
        cluster_state["failed"][task_id] = {
            "error": str(e),
            "timestamp": datetime.now().timestamp()
        }
        return False

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000, workers=4)

从节点实现（Python）

# distributed/slave_node.py
import os
import sys
import json
import time
import socket
import hashlib
import argparse
import subprocess
import psutil
import requests
import cv2
import numpy as np
from paddleocr import PaddleOCR
from datetime import datetime
from threading import Thread, Lock

class SlaveNode:
    def __init__(self, master_ip, master_port, node_id=None):
        self.master_url = f"http://{master_ip}:{master_port}"
        self.node_id = node_id or self.generate_node_id()
        self.current_task = None
        self.task_lock = Lock()
        self.ocr_engine = None
        self.running = True
        
        # 获取系统信息
        self.system_info = {
            "cpu_cores": psutil.cpu_count(logical=True),
            "gpu_available": self.check_gpu_available(),
            "memory_gb": round(psutil.virtual_memory().total / (1024**3), 2),
            "ip_address": self.get_local_ip(),
            "port": 50051
        }
        
        # 启动心跳线程
        self.heartbeat_thread = Thread(target=self.send_heartbeat, daemon=True)
        self.heartbeat_thread.start()
        
        # 注册节点到主节点
        self.register_with_master()
        
        # 初始化OCR引擎
        self.init_ocr_engine()

    def generate_node_id(self):
        """生成唯一节点ID"""
        hostname = socket.gethostname()
        mac = ':'.join(['{:02x}'.format((uuid.getnode() >> ele) & 0xff) for ele in range(0,8*6,8)][::-1])
        return hashlib.md5(f"{hostname}_{mac}".encode()).hexdigest()[:12]

    def get_local_ip(self):
        """获取本地IP地址"""
        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        try:
            s.connect(("8.8.8.8", 80))
            ip = s.getsockname()[0]
        except Exception:
            ip = "127.0.0.1"
        finally:
            s.close()
        return ip

    def check_gpu_available(self):
        """检查GPU是否可用"""
        try:
            import paddle
            return paddle.is_compiled_with_cuda() and len(paddle.static.cuda_places()) > 0
        except:
            return False

    def register_with_master(self):
        """注册节点到主节点"""
        try:
            response = requests.post(
                f"{self.master_url}/register",
                json={
                    "node_id": self.node_id,
                    "cpu_cores": self.system_info["cpu_cores"],
                    "gpu_available": self.system_info["gpu_available"],
                    "memory_gb": self.system_info["memory_gb"],
                    "ip_address": self.system_info["ip_address"],
                    "port": self.system_info["port"]
                }
            )
            if response.status_code == 200:
                print(f"Successfully registered with master node. Node ID: {self.node_id}")
                return True
            else:
                print(f"Failed to register with master. Status code: {response.status_code}")
                return False
        except Exception as e:
            print(f"Registration error: {str(e)}")
            return False

    def send_heartbeat(self):
        """定期发送心跳到主节点"""
        while self.running:
            try:
                with self.task_lock:
                    current_task = self.current_task or ""
                    
                # 获取系统资源使用情况
                cpu_usage = psutil.cpu_percent(interval=1)
                memory_usage = psutil.virtual_memory().percent
                
                response = requests.post(
                    f"{self.master_url}/heartbeat",
                    json={
                        "node_id": self.node_id,
                        "current_task": current_task,
                        "cpu_usage": cpu_usage,
                        "memory_usage": memory_usage,
                        "progress": self.task_progress if hasattr(self, 'task_progress') else 0.0
                    }
                )
                
                if response.status_code != 200:
                    print(f"Heartbeat failed. Status code: {response.status_code}")
                
                # 检查是否有新任务
                self.check_for_new_tasks()
                
                # 心跳间隔：5秒
                time.sleep(5)
                
            except Exception as e:
                print(f"Heartbeat error: {str(e)}")
                time.sleep(5)

    def check_for_new_tasks(self):
        """检查主节点是否有新任务"""
        with self.task_lock:
            if self.current_task is None:
                try:
                    response = requests.get(
                        f"{self.master_url}/node/{self.node_id}/task"
                    )
                    
                    if response.status_code == 200:
                        task_data = response.json()
                        if task_data.get("status") == "assigned":
                            self.current_task = task_data["task_id"]
                            self.start_task_processing(task_data)
                            
                except Exception as e:
                    print(f"Task check error: {str(e)}")

    def init_ocr_engine(self):
        """初始化OCR引擎"""
        try:
            # 从配置文件读取模型路径
            import config
            self.ocr_engine = PaddleOCR(
                use_gpu=config.USE_GPU,
                det_model_dir=config.DET_MODEL_PATH,
                rec_model_dir=config.REC_MODEL_PATH,
                rec_batch_num=config.REC_BATCH_NUM,
                max_batch_size=config.MAX_BATCH_SIZE,
                lang=config.REC_CHAR_TYPE,
                ocr_version=f'PP-OCR{config.MODEL_VERSION.lower()}',
                rec_image_shape=config.REC_IMAGE_SHAPE
            )
            print("OCR engine initialized successfully")
        except Exception as e:
            print(f"Failed to initialize OCR engine: {str(e)}")
            raise

    def start_task_processing(self, task_data):
        """开始处理分配的任务"""
        print(f"Starting task processing: {task_data['task_id']}")
        
        # 启动任务处理线程
        task_thread = Thread(
            target=self.process_task,
            args=(task_data,),
            daemon=True
        )
        task_thread.start()

    def process_task(self, task_data):
        """处理视频帧OCR任务"""
        try:
            task_id = task_data["task_id"]
            video_path = task_data["video_path"]
            start_frame = task_data["start_frame"]
            end_frame = task_data["end_frame"]
            language = task_data.get("language", "ch")
            
            # 创建临时目录存储结果
            temp_dir = f"/tmp/subtitle_cluster/{task_id}"
            os.makedirs(temp_dir, exist_ok=True)
            
            # 打开视频文件
            cap = cv2.VideoCapture(video_path)
            if not cap.isOpened():
                raise Exception(f"Failed to open video file: {video_path}")
            
            # 设置起始帧
            cap.set(cv2.CAP_PROP_POS_FRAMES, start_frame)
            
            # 计算总帧数
            total_frames = end_frame - start_frame + 1
            processed_frames = 0
            
            # 存储识别结果
            recognition_results = []
            
            # 处理帧
            for frame_num in range(start_frame, end_frame + 1):
                ret, frame = cap.read()
                if not ret:
                    break
                    
                # 更新进度
                processed_frames += 1
                self.task_progress = processed_frames / total_frames
                
                # 字幕区域检测和OCR识别
                dt_boxes, rec_res = self.ocr_engine.ocr(frame)
                
                # 存储结果
                if rec_res:
                    recognition_results.append({
                        "frame": frame_num,
                        "timestamp": cap.get(cv2.CAP_PROP_POS_MSEC) / 1000,
                        "boxes": dt_boxes,
                        "texts": [{"text": res[0], "score": res[1]} for res in rec_res]
                    })
                
                # 每处理100帧保存一次中间结果
                if processed_frames % 100 == 0:
                    with open(f"{temp_dir}/partial_{processed_frames}.json", "w", encoding="utf-8") as f:
                        json.dump(recognition_results, f, ensure_ascii=False, indent=2)
            
            # 释放资源
            cap.release()
            
            # 保存最终结果
            with open(f"{temp_dir}/final_results.json", "w", encoding="utf-8") as f:
                json.dump(recognition_results, f, ensure_ascii=False, indent=2)
            
            # 上传结果到主节点
            self.upload_task_results(task_id, temp_dir)
            
            # 清理临时文件
            # shutil.rmtree(temp_dir)
            
            print(f"Task completed successfully: {task_id}")
            
        except Exception as e:
            print(f"Task processing error: {str(e)}")
            # 报告任务失败
            self.report_task_failure(task_data["task_id"], str(e))
            
        finally:
            with self.task_lock:
                self.current_task = None
                self.task_progress = 0.0

    def upload_task_results(self, task_id, result_dir):
        """上传处理结果到主节点"""
        try:
            # 压缩结果目录
            zip_filename = f"{task_id}_results.zip"
            subprocess.run(
                ["zip", "-r", zip_filename, result_dir],
                check=True,
                capture_output=True
            )
            
            # 上传文件
            with open(zip_filename, "rb") as f:
                response = requests.post(
                    f"{self.master_url}/task/{task_id}/upload",
                    files={"result_file": f}
                )
            
            if response.status_code == 200:
                print(f"Results uploaded successfully for task {task_id}")
                return True
            else:
                print(f"Failed to upload results. Status code: {response.status_code}")
                return False
                
        except Exception as e:
            print(f"Result upload error: {str(e)}")
            return False

    def report_task_failure(self, task_id, error_msg):
        """报告任务失败"""
        try:
            requests.post(
                f"{self.master_url}/task/{task_id}/fail",
                json={"error": error_msg}
            )
        except Exception as e:
            print(f"Failed to report task failure: {str(e)}")

    def stop(self):
        """停止节点服务"""
        self.running = False
        if self.heartbeat_thread.is_alive():
            self.heartbeat_thread.join()
        print("Slave node stopped gracefully")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Subtitle Extractor Cluster Slave Node")
    parser.add_argument("--master-ip", required=True, help="Master node IP address")
    parser.add_argument("--master-port", type=int, default=8000, help="Master node port")
    args = parser.parse_args()
    
    node = SlaveNode(master_ip=args.master_ip, master_port=args.master_port)
    
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        node.stop()
        sys.exit(0)

任务分配与负载均衡

智能任务调度算法

实现基于资源利用率的动态任务分配：

# distributed/scheduler.py
def allocate_tasks_optimally(video_path: str, frame_count: int) -> List[Dict]:
    """基于节点资源状况优化分配任务"""
    # 1. 获取可用节点并按性能排序
    sorted_nodes = sorted(
        cluster_state["nodes"].items(),
        key=lambda x: (
            # 优先GPU节点
            1 if x[1]["resources"]["gpu_available"] else 0,
            # 其次CPU核心数
            x[1]["resources"]["cpu_cores"],
            # 然后内存大小
            x[1]["resources"]["memory_gb"],
            # 最后平均处理速度
            -x[1]["performance"]["avg_frame_rate"]
        ),
        reverse=True
    )
    
    # 2. 计算每个节点的任务权重
    total_weight = sum(
        (node["resources"]["cpu_cores"] * (2 if node["resources"]["gpu_available"] else 1)) 
        for _, node in sorted_nodes
    )
    
    # 3. 基于权重分配帧数
    frame_allocation = []
    remaining_frames = frame_count
    
    for node_id, node_info in sorted_nodes:
        if remaining_frames <= 0:
            break
            
        # 计算节点权重比例
        node_weight = (node_info["resources"]["cpu_cores"] * 
                      (2 if node_info["resources"]["gpu_available"] else 1))
        allocation_ratio = node_weight / total_weight
        
        # 分配帧数
        allocated_frames = int(remaining_frames * allocation_ratio)
        # 确保至少分配100帧
        allocated_frames = max(allocated_frames, 100)
        
        frame_allocation.append({
            "node_id": node_id,
            "start_frame": frame_count - remaining_frames,
            "end_frame": frame_count - remaining_frames + allocated_frames - 1
        })
        
        remaining_frames -= allocated_frames
    
    # 分配剩余帧数
    if remaining_frames > 0 and frame_allocation:
        frame_allocation[-1]["end_frame"] += remaining_frames
        
    return frame_allocation

动态负载均衡策略

实现运行时负载调整机制：

# distributed/load_balancer.py
def dynamic_load_balancing():
    """动态调整节点负载"""
    # 每30秒检查一次负载均衡
    while True:
        # 1. 识别过载节点 (CPU>80% 或 内存>85%)
        overloaded_nodes = [
            node_id for node_id, node_info in cluster_state["nodes"].items()
            if node_info["status"] == "busy" and (
                node_info.get("resource_usage", {}).get("cpu_usage", 0) > 80 or
                node_info.get("resource_usage", {}).get("memory_usage", 0) > 85
            )
        ]
        
        # 2. 识别轻载节点 (CPU<40% 且 内存<50%)
        underloaded_nodes = [
            node_id for node_id, node_info in cluster_state["nodes"].items()
            if node_info["status"] == "busy" and (
                node_info.get("resource_usage", {}).get("cpu_usage", 100) < 40 and
                node_info.get("resource_usage", {}).get("memory_usage", 100) < 50
            )
        ]
        
        # 3. 在过载和轻载节点间重新分配任务
        for overloaded_node in overloaded_nodes:
            # 查找该节点的当前任务
            current_task_id = cluster_state["nodes"][overloaded_node]["current_task"]
            if not current_task_id:
                continue
                
            # 查找包含此块任务的主任务
            main_task_id = current_task_id.split("_chunk_")[0]
            if main_task_id not in cluster_state["tasks"]:
                continue
                
            # 获取任务进度
            task_progress = cluster_state["nodes"][overloaded_node].get("progress", 0.0)
            
            # 如果进度<50%，考虑迁移任务
            if task_progress < 0.5 and underloaded_nodes:
                # 选择目标轻载节点
                target_node = underloaded_nodes.pop(0)
                
                # 记录原任务信息
                original_task = cluster_state["tasks"][main_task_id]["chunks"][current_task_id]
                
                # 创建新任务块
                new_chunk_id = f"{main_task_id}_chunk_{len(cluster_state['tasks'][main_task_id]['chunks'])}"
                mid_frame = int(original_task["start_frame"] + 
                               (original_task["end_frame"] - original_task["start_frame"]) * task_progress)
                
                # 更新原任务
                cluster_state["tasks"][main_task_id]["chunks"][current_task_id]["end_frame"] = mid_frame
                cluster_state["tasks"][main_task_id]["chunks"][current_task_id]["status"] = "migrated"
                
                # 创建新任务
                cluster_state["tasks"][main_task_id]["chunks"][new_chunk_id] = {
                    "node_id": target_node,
                    "start_frame": mid_frame + 1,
                    "end_frame": original_task["end_frame"],
                    "status": "assigned",
                    "progress": 0.0
                }
                
                # 通知过载节点停止当前任务
                notify_task_migration(overloaded_node, current_task_id, mid_frame)
                
                # 分配新任务给轻载节点
                assign_new_task(target_node, new_chunk_id, main_task_id, 
                               mid_frame + 1, original_task["end_frame"])
                
                print(f"Migrated task {current_task_id} from {overloaded_node} to {target_node}")
        
        # 检查间隔
        time.sleep(30)

结果合并与后处理

分布式结果合并算法

# distributed/result_merger.py
def merge_subtitle_results(task_id: str, output_path: str):
    """合并分布式处理的字幕结果"""
    # 1. 收集所有块结果
    chunk_results = []
    for chunk_id, chunk_info in cluster_state["tasks"][task_id]["chunks"].items():
        if chunk_info["status"] == "completed":
            # 读取块结果
            chunk_file = os.path.join("/tmp", f"{chunk_id}_results.json")
            with open(chunk_file, "r", encoding="utf-8") as f:
                results = json.load(f)
                chunk_results.append((chunk_info["start_frame"], results))
    
    # 2. 按帧号排序结果
    chunk_results.sort(key=lambda x: x[0])
    
    # 3. 合并结果并去重
    merged_frames = []
    previous_text = ""
    subtitle_buffer = []
    
    for start_frame, chunk_data in chunk_results:
        for frame in chunk_data:
            # 提取文本内容
            current_text = " ".join([item["text"] for item in frame["texts"]])
            
            # 文本去重（使用动态阈值算法）
            similarity = text_similarity(previous_text, current_text)
            threshold = calculate_dynamic_threshold(previous_text, current_text)
            
            if similarity < threshold:
                # 新字幕行
                if subtitle_buffer:
                    merged_frames.append(merge_buffer(subtitle_buffer))
                    subtitle_buffer = []
                
                subtitle_buffer.append(frame)
                previous_text = current_text
            else:
                # 相似文本，加入缓冲区
                subtitle_buffer.append(frame)
    
    # 添加最后一个缓冲区内容
    if subtitle_buffer:
        merged_frames.append(merge_buffer(subtitle_buffer))
    
    # 4. 生成SRT文件
    generate_srt(merged_frames, output_path)
    
    return output_path

def text_similarity(text1: str, text2: str) -> float:
    """计算文本相似度（简化版）"""
    if not text1 or not text2:
        return 0.0
        
    # 使用编辑距离计算相似度
    from difflib import SequenceMatcher
    return SequenceMatcher(None, text1, text2).ratio()

def calculate_dynamic_threshold(text1: str, text2: str) -> float:
    """计算动态相似度阈值"""
    min_length = min(len(text1), len(text2))
    
    # 短文本降低阈值，长文本提高阈值
    if min_length < 5:
        return 0.5  # 短文本容忍度高
    elif min_length < 15:
        return 0.7  # 中等文本
    else:
        return 0.85  # 长文本要求高相似度

def merge_buffer(buffer: List[Dict]) -> Dict:
    """合并相似帧的字幕内容"""
    if not buffer:
        return {}
        
    # 使用多数投票确定最终文本
    text_candidates = [" ".join([item["text"] for item in frame["texts"]]) for frame in buffer]
    most_common_text = max(set(text_candidates), key=text_candidates.count)
    
    # 计算时间范围
    start_time = buffer[0]["timestamp"]
    end_time = buffer[-1]["timestamp"] + 1.0  # 增加1秒作为结束时间
    
    return {
        "start_time": start_time,
        "end_time": end_time,
        "text": most_common_text
    }

def generate_srt(merged_frames: List[Dict], output_path: str):
    """生成SRT字幕文件"""
    with open(output_path, "w", encoding="utf-8") as f:
        for i, frame in enumerate(merged_frames, 1):
            # 格式化时间
            start_time = format_timestamp(frame["start_time"])
            end_time = format_timestamp(frame["end_time"])
            
            # 写入SRT条目
            f.write(f"{i}\n")
            f.write(f"{start_time} --> {

九章云极普惠算力

更多推荐

Kandinsky-5.0-I2V-Lite-5s惊艳效果展示：赛博朋克街景→霓虹闪烁+雨滴滑落动态视频

本文介绍了如何在星图GPU平台上自动化部署Kandinsky-5.0-I2V-Lite-5s镜像，实现高效图生视频转换。该工具能将静态赛博朋克街景图片快速转换为动态视频，添加霓虹闪烁、雨滴滑落等效果，适用于短视频制作、广告创意等场景，显著提升内容创作效率。

九章云极普惠算力

Phi-4-reasoning-vision-15BGPU算力优化：15.6GB/15.1GB显存占用实测分析

本文介绍了如何在星图GPU平台上自动化部署Phi-4-reasoning-vision-15B镜像，并实测分析了其显存占用。该平台简化了部署流程，用户可快速搭建环境，利用该模型进行图片问答、图表分析等复杂的视觉理解任务，显著提升多模态AI应用的开发效率。

九章云极普惠算力

Web测试自动化：使用Best-of-web-python推荐的50个测试工具

Web测试自动化是确保Web应用程序质量的关键环节，而Python作为最流行的编程语言之一，拥有丰富的测试工具生态系统。Best-of-web-python项目精心整理了580个优秀的Python Web开发库，其中Web Testing类别包含49个专业工具，涵盖了从单元测试到端到端测试、从API测试到性能测试的完整解决方案。## 🎯 为什么选择Python进行Web测试自动化？Pyt