基于mediapipe框架,在rk3588上新增一个目标检测pipeline
废话不多说,直接从pbtxt的数据流定义开始,数据流图如下:输入视频流经过节流器,再送到RGA模块进行缩放,缩放到yolov5模型需要的640x640的尺寸,继续送到RknnYolov5进行npu推理,推理出的结果经过后处理得到目标框、类别、加权置信度。最终基于这些数据在FlowLimiter输出的源图像帧上进行绘制,并叠加显示。
提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
文章目录
引言
上篇文章我们讲了如何在mediapipe中新增一个calculator和计算图,并给出了在rk3588平台上新增rga calculator和计算图的实例。这篇文章我们更近一步,继续在rk3588上新增一个目标检测计算图。
一、 定义数据流
废话不多说,直接从pbtxt的数据流定义开始,数据流图如下:
输入视频流经过FlowLimiter节流器,再送到RGA模块进行缩放,缩放到yolov5模型需要的640x640的尺寸,继续送到RknnYolov5进行npu推理,推理出的结果经过PostProcess后处理得到目标框、类别、加权置信度。最终基于这些数据在FlowLimiter输出的源图像帧上进行绘制,并叠加显示。
RknnYolov5处理完成后,会给FlowLimiter发送一个FINISHED信号,通知其可以继续送帧推理,RknnYolov5还会输出一个sidepacket静态数据,包含模型量化数据的缩放因子和零点,给PostProcess后处理使用。具体细节后文会展开。该计算图的proto配置文件如下:
# MediaPipe graph that performs object detection with yolov5 on rk3588
# Used in the examples in
# mediapipe/examples/desktop/rknn_yolov5:rknn_yolov5
# Images on CPU coming into and out of the graph.
input_stream: "input_video"
output_stream: "output_video"
# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for
# TfLiteTensorsToDetectionsCalculator downstream in the graph to finish
# generating the corresponding detections before it passes through another
# image. All images that come in while waiting are dropped, limiting the number
# of in-flight images between this calculator and
# TfLiteTensorsToDetectionsCalculator to 1. This prevents the nodes in between
# from queuing up incoming images and data excessively, which leads to increased
# latency and memory usage, unwanted in real-time mobile applications. It also
# eliminates unnecessarily computation, e.g., a transformed image produced by
# ImageTransformationCalculator may get dropped downstream if the subsequent
# TfLiteConverterCalculator or TfLiteInferenceCalculator is still busy
# processing previous inputs.
node {
calculator: "FlowLimiterCalculator"
input_stream: "input_video"
input_stream: "FINISHED:rknnoutput"
input_stream_info: {
tag_index: "FINISHED"
back_edge: true
}
output_stream: "throttled_input_video"
}
# Transforms the input image on RGA to a 640x640 image.
node {
calculator: "RgaCalculator"
input_stream: "IMAGE:throttled_input_video"
output_stream: "IMAGE:transformed_input_video"
node_options: {
[type.googleapis.com/mediapipe.RgaCalculatorOptions] {
output_width: 640
output_height: 640
}
}
}
# Runs a rknn model on npu
node {
calculator: "RknnYolov5Calculator"
input_stream: "IMAGE:transformed_input_video"
output_side_packet: "SCALEZPS:scalezps"
output_stream: "RKNNOUTPUT:rknnoutput"
node_options: {
[type.googleapis.com/mediapipe.RknnYolov5CalculatorOptions] {
model_path: "mediapipe/models/yolov5s-640-640.rknn"
}
}
}
# Performs non-max suppression to remove excessive rknnoutput.
node {
calculator: "PostProcessCalculator"
input_side_packet: "SCALEZPS:scalezps"
input_stream: "RKNNOUTPUT:rknnoutput"
input_stream: "IMAGE:throttled_input_video"
output_stream: "IMAGE:output_video"
node_options: {
[type.googleapis.com/mediapipe.PostProcessCalculatorOptions] {
box_conf_threshold: 0.45
nms_threshold: 0.25
label_map_path: "mediapipe/models/coco_80_labels_list.txt"
}
}
}
二、 新增caculator和graph
2.1 新增RknnYolov5
2.1.1 配置文件
在mediapipe/calculators/rknn文件夹中新增rknn_yolov5_calculator.proto文件,该文件是为了导出模型路径配置,内容如下:
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
package mediapipe;
import "mediapipe/framework/calculator.proto";
message RknnYolov5CalculatorOptions {
// Path to the rknn yolo model
optional string model_path = 1;
}
在mediapipe/calculators/rknn文件夹中新建一个BUILD文件,新增内容:
mediapipe_proto_library(
name = "rknn_yolov5_calculator_proto",
srcs = ["rknn_yolov5_calculator.proto"],
deps = [
"//mediapipe/framework:calculator_options_proto",
"//mediapipe/framework:calculator_proto",
],
)
在mediapipe/framework/tool/mediapipe_proto_allowlist.bzl文件rewrite_target_list中新增
"rknn_yolov5_calculator_proto",
添加完成,后续编译后,就可以代码中使用模型路径字段了。
2.1.2 头文件定义
因为本文使用RK3588平台的yolo进行推理,因此需要自定义一些平台相关的数据结构,以便两个calculator之间传输数据。
在mediapipe/framework/formats/rknn中新增output.h文件,内容如下:
// Copyright 2020 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MEDIAPIPE_FRAMEWORK_FORMATS_RKNN_H_
#define MEDIAPIPE_FRAMEWORK_FORMATS_RKNN_H_
#include <algorithm>
#include <cstdint>
#include <functional>
#include <initializer_list>
#include <memory>
#include <numeric>
#include <tuple>
#include <type_traits>
#include <utility>
#include <vector>
#include "rknn_api.h"
namespace mediapipe {
#define MAX_RKNN_OUTPUT_NUM 5
#define OBJ_NAME_MAX_SIZE 16
#define OBJ_NUMB_MAX_SIZE 64
#define OBJ_CLASS_NUM 80
#define PROP_BOX_SIZE (5+OBJ_CLASS_NUM)
typedef struct _RknnOutputs
{
uint32_t num; /* the num of outputs*/
rknn_output outputs[MAX_RKNN_OUTPUT_NUM];
} RknnOutputs;
typedef struct _ScaleZps
{
std::vector<float> out_scales;
std::vector<int32_t> out_zps;
} ScaleZps;
typedef struct _BOX_RECT
{
int left;
int right;
int top;
int bottom;
} BOX_RECT;
typedef struct __detect_result_t
{
char name[OBJ_NAME_MAX_SIZE];
BOX_RECT box;
float prop;
} detect_result_t;
typedef struct _detect_result_group_t
{
int id;
int count;
detect_result_t results[OBJ_NUMB_MAX_SIZE];
} detect_result_group_t;
} // namespace mediapipe
#endif // MEDIAPIPE_FRAMEWORK_FORMATS_RKNN_H_
同时在mediapipe/framework/formats/rknn中新增BUILD文件,导出该头文件,内容如下:
# Copyright 2019 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
package(
default_visibility = ["//visibility:private"],
features = ["-layering_check"],
)
licenses(["notice"])
exports_files([
"output.h",
])
这样就可以在两个calculator之间使用头文件中自定义的数据结构了。
2.1.3 calculator实现
calculator的实现依然是继承CalculatorBase类,并重写GetContract、Open、Process、Close方法。在GetContract中检查并定义输入输出数据结构;Open方法用来加载yolov5模型,初始化rknn的运行环境,并dump打印模型的一些信息,最后将模型的量化数据缩放因子和零点作为side packets输出到下一个节点;Process方法中对每个输入图像在npu上进行推理,然后将推理结果输出到RknnOutputs结构体中,并将其送到下个节点。Close方法用来释放加载的模型数据和rknn环境。具体代码如下:
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "absl/status/status.h"
#include "mediapipe/calculators/rknn/rknn_yolov5_calculator.pb.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/formats/video_stream_header.h"
#include "mediapipe/framework/formats/rknn/output.h"
#include "mediapipe/framework/packet.h"
#include "mediapipe/framework/port/opencv_core_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/status.h"
#include "mediapipe/framework/timestamp.h"
#include "rknn_api.h"
typedef int DimensionsPacketType[2];
#define DEFAULT_SCALE_MODE mediapipe::ScaleMode_Mode_STRETCH
namespace mediapipe {
namespace {
constexpr char kImageFrameTag[] = "IMAGE";
constexpr char kRknnOutputTag[] = "RKNNOUTPUT";
constexpr char kScaleZpsTag[] = "SCALEZPS";
} // namespace
class RknnYolov5Calculator : public CalculatorBase {
public:
RknnYolov5Calculator() = default;
~RknnYolov5Calculator() override = default;
static absl::Status GetContract(CalculatorContract* cc);
absl::Status Open(CalculatorContext* cc) override;
absl::Status Process(CalculatorContext* cc) override;
absl::Status Close(CalculatorContext* cc) override;
private:
::mediapipe::RknnYolov5CalculatorOptions options_;
// char *model_path_ = nullptr;
rknn_context ctx_;
unsigned char* model_data_ = nullptr;
rknn_input_output_num io_num_;
int input_channel_ = 3;
int input_width_ = 0;
int input_height_ = 0;
unsigned char* load_data(FILE* fp, size_t ofst, size_t sz);
unsigned char* load_model(const char* filename, int* model_size);
void dump_tensor_attr(rknn_tensor_attr* attr);
};
REGISTER_CALCULATOR(RknnYolov5Calculator);
// static
absl::Status RknnYolov5Calculator::GetContract(CalculatorContract* cc)
{
const auto& options = cc->Options<::mediapipe::RknnYolov5CalculatorOptions>();
RET_CHECK(!options.model_path().empty())
<< "Either model as side packet or model path in options is required.";
// Side packets
cc->OutputSidePackets().Tag(kScaleZpsTag).Set<ScaleZps>();
// Only one input can be set, and the output type must match.
RET_CHECK(cc->Inputs().HasTag(kImageFrameTag));
if (cc->Inputs().HasTag(kImageFrameTag))
{
RET_CHECK(cc->Outputs().HasTag(kRknnOutputTag));
cc->Inputs().Tag(kImageFrameTag).Set<ImageFrame>();
cc->Outputs().Tag(kRknnOutputTag).Set<RknnOutputs>();
}
// Assign this calculator's default InputStreamHandler.
cc->SetInputStreamHandler("FixedSizeInputStreamHandler");
return absl::OkStatus();
}
void RknnYolov5Calculator::dump_tensor_attr(rknn_tensor_attr* attr)
{
std::string shape_str = attr->n_dims < 1 ? "" : std::to_string(attr->dims[0]);
for (int i = 1; i < attr->n_dims; ++i) {
shape_str += ", " + std::to_string(attr->dims[i]);
}
printf(" index=%d, name=%s, n_dims=%d, dims=[%s], n_elems=%d, size=%d, w_stride = %d, size_with_stride=%d, fmt=%s, "
"type=%s, qnt_type=%s, "
"zp=%d, scale=%f\n",
attr->index, attr->name, attr->n_dims, shape_str.c_str(), attr->n_elems, attr->size, attr->w_stride,
attr->size_with_stride, get_format_string(attr->fmt), get_type_string(attr->type),
get_qnt_type_string(attr->qnt_type), attr->zp, attr->scale);
}
unsigned char* RknnYolov5Calculator::load_data(FILE* fp, size_t ofst, size_t sz)
{
unsigned char* data;
int ret;
data = NULL;
if (NULL == fp) {
return NULL;
}
ret = fseek(fp, ofst, SEEK_SET);
if (ret != 0) {
printf("blob seek failure.\n");
return NULL;
}
data = (unsigned char*)malloc(sz);
if (data == NULL) {
printf("buffer malloc failure.\n");
return NULL;
}
ret = fread(data, 1, sz, fp);
return data;
}
unsigned char* RknnYolov5Calculator::load_model(const char* filename, int* model_size)
{
FILE* fp;
unsigned char* data;
fp = fopen(filename, "rb");
if (NULL == fp)
{
printf("Open file %s failed.\n", filename);
return NULL;
}
fseek(fp, 0, SEEK_END);
int size = ftell(fp);
data = load_data(fp, 0, size);
fclose(fp);
*model_size = size;
return data;
}
absl::Status RknnYolov5Calculator::Open(CalculatorContext* cc)
{
options_ = cc->Options<::mediapipe::RknnYolov5CalculatorOptions>();
// model_path_ = options_.model_path().c_str();
// Create the neural network
int ret = 0;
ABSL_LOG(INFO) << "Loading mode...";
int model_data_size = 0;
model_data_ = load_model(options_.model_path().c_str(), &model_data_size);
ret = rknn_init(&ctx_, model_data_, model_data_size, 0, NULL);
if (ret < 0)
{
printf("rknn_init error ret=: %d\n", ret);
return absl::UnavailableError("rknn_init error");
}
rknn_sdk_version version;
ret = rknn_query(ctx_, RKNN_QUERY_SDK_VERSION, &version, sizeof(rknn_sdk_version));
if (ret < 0)
{
printf("rknn_init error ret=: %d\n", ret);
return absl::UnavailableError("rknn_init error");
}
printf("sdk version: %s driver version: %s\n", version.api_version, version.drv_version);
ret = rknn_query(ctx_, RKNN_QUERY_IN_OUT_NUM, &io_num_, sizeof(io_num_));
if (ret < 0)
{
printf("rknn_query error ret=%d\n", ret);
return absl::UnavailableError("rknn_init error");
}
printf("model input num: %d, output num: %d\n", io_num_.n_input, io_num_.n_output);
rknn_tensor_attr input_attrs[io_num_.n_input];
memset(input_attrs, 0, sizeof(input_attrs));
for (int i = 0; i < io_num_.n_input; i++)
{
input_attrs[i].index = i;
ret = rknn_query(ctx_, RKNN_QUERY_INPUT_ATTR, &(input_attrs[i]), sizeof(rknn_tensor_attr));
if (ret < 0) {
printf("rknn_init error ret=%d\n", ret);
return absl::UnavailableError("rknn_init error");
}
dump_tensor_attr(&(input_attrs[i]));
}
rknn_tensor_attr output_attrs[io_num_.n_output];
memset(output_attrs, 0, sizeof(output_attrs));
for (int i = 0; i < io_num_.n_output; i++)
{
output_attrs[i].index = i;
ret = rknn_query(ctx_, RKNN_QUERY_OUTPUT_ATTR, &(output_attrs[i]), sizeof(rknn_tensor_attr));
dump_tensor_attr(&(output_attrs[i]));
}
if (input_attrs[0].fmt == RKNN_TENSOR_NCHW)
{
printf("model is NCHW input fmt\n");
input_channel_ = input_attrs[0].dims[1];
input_height_ = input_attrs[0].dims[2];
input_width_ = input_attrs[0].dims[3];
}
else
{
printf("model is NHWC input fmt\n");
input_height_ = input_attrs[0].dims[1];
input_width_ = input_attrs[0].dims[2];
input_channel_ = input_attrs[0].dims[3];
}
printf("model input height=%d, width=%d, channel=%d\n", input_height_, input_width_, input_channel_);
// Pass side packets
ScaleZps scale_zps;
for (int i = 0; i < io_num_.n_output; ++i)
{
scale_zps.out_scales.push_back(output_attrs[i].scale);
scale_zps.out_zps.push_back(output_attrs[i].zp);
}
cc->OutputSidePackets().Tag(kScaleZpsTag).Set(MakePacket<ScaleZps>(scale_zps));
return absl::OkStatus();
}
absl::Status RknnYolov5Calculator::Process(CalculatorContext* cc)
{
// Convert ImageFrame to OpenCV Mat
const auto& input_frame = cc->Inputs().Tag(kImageFrameTag).Get<ImageFrame>();
cv::Mat input_mat = mediapipe::formats::MatView(&input_frame);
int img_chns = input_mat.channels();
int img_width = input_mat.cols;
int img_height = input_mat.rows;
if (img_width != input_width_ || img_height != input_height_ || img_chns != input_channel_)
{
return absl::UnavailableError("Image size is not correct");
}
// Set rknn input and output
rknn_input inputs[1];
memset(inputs, 0, sizeof(inputs));
inputs[0].index = 0;
inputs[0].type = RKNN_TENSOR_UINT8;
inputs[0].size = input_width_ * input_height_ * input_channel_;
inputs[0].fmt = RKNN_TENSOR_NHWC;
inputs[0].pass_through = 0;
// Check if input_mat is valid
if (input_mat.empty() || !input_mat.isContinuous())
{
return absl::InternalError("Invalid input image data");
}
inputs[0].buf = (void*)input_mat.data;
// Run the rknn
int ret = rknn_inputs_set(ctx_, io_num_.n_input, inputs);
if (ret < 0)
{
return absl::InternalError("Failed to set rknn inputs");
}
rknn_output outputs[io_num_.n_output];
memset(outputs, 0, sizeof(outputs));
for (int i = 0; i < io_num_.n_output; i++)
{
outputs[i].want_float = 0;
}
ret = rknn_run(ctx_, NULL);
if (ret < 0)
{
return absl::InternalError("Failed to run rknn");
}
ret = rknn_outputs_get(ctx_, io_num_.n_output, outputs, nullptr);
if (ret < 0)
{
return absl::InternalError("Failed to get rknn outputs");
}
// Output result
std::unique_ptr<mediapipe::RknnOutputs> rknn_outputs(new mediapipe::RknnOutputs());
rknn_outputs->num = io_num_.n_output;
for (int i = 0; i < io_num_.n_output; i++)
{
if (!outputs[i].buf || outputs[i].size <= 0)
{
return absl::InternalError("Invalid output buffer or size");
}
memcpy(&rknn_outputs->outputs[i], &outputs[i], sizeof(rknn_output));
rknn_outputs->outputs[i].buf = calloc(1, outputs[i].size);
memcpy(rknn_outputs->outputs[i].buf, outputs[i].buf, outputs[i].size);
}
cc->Outputs().Tag(kRknnOutputTag).Add(rknn_outputs.release(), cc->InputTimestamp());
// Release rknn outputs
rknn_outputs_release(ctx_, io_num_.n_output, outputs);
return absl::OkStatus();
}
absl::Status RknnYolov5Calculator::Close(CalculatorContext* cc)
{
// release
int ret = rknn_destroy(ctx_);
if (ret < 0)
{
return absl::UnknownError("Failed to destroy RKNN context");
}
if (model_data_)
{
free(model_data_);
model_data_ = nullptr;
}
return absl::OkStatus();
}
} // namespace mediapipe
为了编译器能编译上面的caculator,在mediapipe/calculators/rknn文件夹的BUILD文件,新增内容:
cc_library(
name = "rknn_yolov5_calculator",
hdrs = ["//mediapipe/framework/formats/rknn:output.h"],
srcs = ["rknn_yolov5_calculator.cc"],
copts = select({
"//mediapipe:ios": [
"-x objective-c++",
"-fobjc-arc", # enable reference-counting
],
"//conditions:default": [],
}),
deps = [
":rknn_yolov5_calculator_cc_proto",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:packet",
"//mediapipe/framework:timestamp",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/formats:image_frame_opencv",
"//mediapipe/framework/formats:video_stream_header",
"//mediapipe/framework/port:opencv_core",
"//mediapipe/framework/port:opencv_imgproc",
"//mediapipe/framework/port:logging",
"//mediapipe/framework/port:ret_check",
"//mediapipe/framework/port:status",
"//mediapipe/framework/stream_handler:fixed_size_input_stream_handler",
"@com_google_absl//absl/log:absl_check",
"@com_google_absl//absl/log:absl_log",
"@com_google_absl//absl/memory",
"@com_google_absl//absl/status",
"@com_google_absl//absl/strings",
],
alwayslink = 1,
)
2.2. 新增PostProcess
PostProcess节点用来将rknnyolov5的输出解码成目标框、置信度、类别,然后在源图像中进行叠加显示,新增的步骤和2.1 类似。
2.2.1 配置文件
在mediapipe/calculators/rknn文件夹中新增rknn_yolov5_calculator.proto文件,内容如下:
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
package mediapipe;
import "mediapipe/framework/calculator.proto";
message PostProcessCalculatorOptions {
// Path to the rknn yolo model
optional string label_map_path = 1;
optional float box_conf_threshold = 2 [default = -1.0];
optional float nms_threshold = 3 [default = -1.0];
}
在mediapipe/calculators/rknn文件夹的BUILD文件,新增内容:
mediapipe_proto_library(
name = "post_process_calculator_proto",
srcs = ["post_process_calculator.proto"],
deps = [
"//mediapipe/framework:calculator_options_proto",
"//mediapipe/framework:calculator_proto",
],
)
2.2.2 caculator实现
在Open方法中加载labeltxt文件,和rknnyolov5节点传下来的sidepackets配置(模型量化参数缩放因子和零点)。在Process方法中,基于sidepackets配置、RknnOutputs,在FlowLimiter传下来的源图像帧中绘制后处理结果。代码如下:
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "absl/status/status.h"
#include "mediapipe/calculators/rknn/post_process_calculator.pb.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/formats/video_stream_header.h"
#include "mediapipe/framework/formats/rknn/output.h"
#include "mediapipe/framework/packet.h"
#include "mediapipe/framework/port/opencv_core_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/opencv_imgcodecs_inc.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/status.h"
#include "mediapipe/framework/timestamp.h"
#include "opencv2/imgcodecs.hpp"
#include "rknn_api.h"
namespace mediapipe {
namespace {
constexpr char kRknnOutputTag[] = "RKNNOUTPUT";
// constexpr char kDetectResult[] = "DETECTRESULT";
constexpr char kScaleZpsTag[] = "SCALEZPS";
constexpr char kImageFrameTag[] = "IMAGE";
} // namespace
class PostProcessCalculator : public CalculatorBase {
public:
PostProcessCalculator() = default;
~PostProcessCalculator() override = default;
static absl::Status GetContract(CalculatorContract* cc);
absl::Status Open(CalculatorContext* cc) override;
absl::Status Process(CalculatorContext* cc) override;
absl::Status Close(CalculatorContext* cc) override;
private:
::mediapipe::PostProcessCalculatorOptions options_;
char* labels_[OBJ_CLASS_NUM];
const int anchor0[6] = {10, 13, 16, 30, 33, 23};
const int anchor1[6] = {30, 61, 62, 45, 59, 119};
const int anchor2[6] = {116, 90, 156, 198, 373, 326};
ScaleZps scale_zps_;
float box_conf_threshold_;
float nms_threshold_;
char* ReadLine(FILE* fp, char* buffer, int* len);
int ReadLines(const char* fileName, char* lines[], int max_line);
int LoadLabelName(const char* locationFilename, char* label[]);
inline int clamp(float val, int min, int max) { return val > min ? (val < max ? val : max) : min; }
float CalculateOverlap(float xmin0, float ymin0, float xmax0, float ymax0, float xmin1, float ymin1, float xmax1,
float ymax1);
int nms(int validCount, std::vector<float>& outputLocations, std::vector<int> classIds, std::vector<int>& order,
int filterId, float threshold);
int quick_sort_indice_inverse(std::vector<float>& input, int left, int right, std::vector<int>& indices);
inline float sigmoid(float x) { return 1.0 / (1.0 + expf(-x)); }
inline float unsigmoid(float y) { return -1.0 * logf((1.0 / y) - 1.0); }
inline int32_t __clip(float val, float min, float max)
{
float f = val <= min ? min : (val >= max ? max : val);
return f;
}
int8_t qnt_f32_to_affine(float f32, int32_t zp, float scale);
float deqnt_affine_to_f32(int8_t qnt, int32_t zp, float scale) { return ((float)qnt - (float)zp) * scale; }
int process(int8_t* input, int* anchor, int grid_h, int grid_w, int height, int width, int stride,
std::vector<float>& boxes, std::vector<float>& objProbs, std::vector<int>& classId, float threshold,
int32_t zp, float scale);
int post_process(int8_t* input0, int8_t* input1, int8_t* input2, int model_in_h, int model_in_w, float conf_threshold,
float nms_threshold, float scale_w, float scale_h, std::vector<int32_t>& qnt_zps,
std::vector<float>& qnt_scales, detect_result_group_t* group);
void deinitPostProcess();
};
REGISTER_CALCULATOR(PostProcessCalculator);
// static
absl::Status PostProcessCalculator::GetContract(CalculatorContract* cc)
{
const auto& options = cc->Options<::mediapipe::PostProcessCalculatorOptions>();
RET_CHECK(!options.label_map_path().empty())
<< "Either model as side packet or model path in options is required.";
// Side packets.
cc->InputSidePackets().Tag(kScaleZpsTag).Set<ScaleZps>();
if (cc->Inputs().HasTag(kRknnOutputTag))
{
// RET_CHECK(cc->Outputs().HasTag(kDetectResult));
cc->Inputs().Tag(kRknnOutputTag).Set<RknnOutputs>();
cc->Inputs().Tag(kImageFrameTag).Set<ImageFrame>();
// cc->Outputs().Tag(kDetectResult).Set<detect_result_group_t>();
cc->Outputs().Tag(kImageFrameTag).Set<ImageFrame>();
}
return absl::OkStatus();
}
char* PostProcessCalculator::ReadLine(FILE* fp, char* buffer, int* len)
{
int ch;
int i = 0;
size_t buff_len = 0;
buffer = (char*)malloc(buff_len + 1);
if (!buffer)
return NULL; // Out of memory
while ((ch = fgetc(fp)) != '\n' && ch != EOF) {
buff_len++;
void* tmp = realloc(buffer, buff_len + 1);
if (tmp == NULL) {
free(buffer);
return NULL; // Out of memory
}
buffer = (char*)tmp;
buffer[i] = (char)ch;
i++;
}
buffer[i] = '\0';
*len = buff_len;
// Detect end
if (ch == EOF && (i == 0 || ferror(fp))) {
free(buffer);
return NULL;
}
return buffer;
}
int PostProcessCalculator::ReadLines(const char* fileName, char* lines[], int max_line)
{
FILE* file = fopen(fileName, "r");
char* s;
int i = 0;
int n = 0;
if (file == NULL) {
printf("Open %s fail!\n", fileName);
return -1;
}
while ((s = ReadLine(file, s, &n)) != NULL) {
lines[i++] = s;
if (i >= max_line)
break;
}
fclose(file);
return i;
}
int PostProcessCalculator::LoadLabelName(const char* locationFilename, char* label[])
{
printf("loadLabelName %s\n", locationFilename);
ReadLines(locationFilename, label, OBJ_CLASS_NUM);
return 0;
}
absl::Status PostProcessCalculator::Open(CalculatorContext* cc)
{
options_ = cc->Options<::mediapipe::PostProcessCalculatorOptions>();
// threshold
box_conf_threshold_ = options_.box_conf_threshold();
nms_threshold_ = options_.nms_threshold();
// label name
int ret = 0;
ret = LoadLabelName(options_.label_map_path().c_str(), labels_);
if (ret < 0)
{
printf("load label name error ret=: %d\n", ret);
return absl::UnavailableError("load label name error");
}
// side packets.
scale_zps_ = cc->InputSidePackets().Tag(kScaleZpsTag).Get<ScaleZps>();
return absl::OkStatus();
}
float PostProcessCalculator::CalculateOverlap(float xmin0, float ymin0, float xmax0, float ymax0, float xmin1, float ymin1, float xmax1,
float ymax1)
{
float w = fmax(0.f, fmin(xmax0, xmax1) - fmax(xmin0, xmin1) + 1.0);
float h = fmax(0.f, fmin(ymax0, ymax1) - fmax(ymin0, ymin1) + 1.0);
float i = w * h;
float u = (xmax0 - xmin0 + 1.0) * (ymax0 - ymin0 + 1.0) + (xmax1 - xmin1 + 1.0) * (ymax1 - ymin1 + 1.0) - i;
return u <= 0.f ? 0.f : (i / u);
}
int PostProcessCalculator::nms(int validCount, std::vector<float>& outputLocations, std::vector<int> classIds, std::vector<int>& order,
int filterId, float threshold)
{
for (int i = 0; i < validCount; ++i) {
if (order[i] == -1 || classIds[i] != filterId) {
continue;
}
int n = order[i];
for (int j = i + 1; j < validCount; ++j) {
int m = order[j];
if (m == -1 || classIds[i] != filterId) {
continue;
}
float xmin0 = outputLocations[n * 4 + 0];
float ymin0 = outputLocations[n * 4 + 1];
float xmax0 = outputLocations[n * 4 + 0] + outputLocations[n * 4 + 2];
float ymax0 = outputLocations[n * 4 + 1] + outputLocations[n * 4 + 3];
float xmin1 = outputLocations[m * 4 + 0];
float ymin1 = outputLocations[m * 4 + 1];
float xmax1 = outputLocations[m * 4 + 0] + outputLocations[m * 4 + 2];
float ymax1 = outputLocations[m * 4 + 1] + outputLocations[m * 4 + 3];
float iou = CalculateOverlap(xmin0, ymin0, xmax0, ymax0, xmin1, ymin1, xmax1, ymax1);
if (iou > threshold) {
order[j] = -1;
}
}
}
return 0;
}
int PostProcessCalculator::quick_sort_indice_inverse(std::vector<float>& input, int left, int right, std::vector<int>& indices)
{
float key;
int key_index;
int low = left;
int high = right;
if (left < right) {
key_index = indices[left];
key = input[left];
while (low < high) {
while (low < high && input[high] <= key) {
high--;
}
input[low] = input[high];
indices[low] = indices[high];
while (low < high && input[low] >= key) {
low++;
}
input[high] = input[low];
indices[high] = indices[low];
}
input[low] = key;
indices[low] = key_index;
quick_sort_indice_inverse(input, left, low - 1, indices);
quick_sort_indice_inverse(input, low + 1, right, indices);
}
return low;
}
int8_t PostProcessCalculator::qnt_f32_to_affine(float f32, int32_t zp, float scale)
{
float dst_val = (f32 / scale) + zp;
int8_t res = (int8_t)__clip(dst_val, -128, 127);
return res;
}
int PostProcessCalculator::process(int8_t* input, int* anchor, int grid_h, int grid_w, int height, int width, int stride,
std::vector<float>& boxes, std::vector<float>& objProbs, std::vector<int>& classId, float threshold,
int32_t zp, float scale)
{
int validCount = 0;
int grid_len = grid_h * grid_w;
float thres = unsigmoid(threshold);
int8_t thres_i8 = qnt_f32_to_affine(thres, zp, scale);
for (int a = 0; a < 3; a++) {
for (int i = 0; i < grid_h; i++) {
for (int j = 0; j < grid_w; j++) {
int8_t box_confidence = input[(PROP_BOX_SIZE * a + 4) * grid_len + i * grid_w + j];
if (box_confidence >= thres_i8) {
int offset = (PROP_BOX_SIZE * a) * grid_len + i * grid_w + j;
int8_t* in_ptr = input + offset;
float box_x = sigmoid(deqnt_affine_to_f32(*in_ptr, zp, scale)) * 2.0 - 0.5;
float box_y = sigmoid(deqnt_affine_to_f32(in_ptr[grid_len], zp, scale)) * 2.0 - 0.5;
float box_w = sigmoid(deqnt_affine_to_f32(in_ptr[2 * grid_len], zp, scale)) * 2.0;
float box_h = sigmoid(deqnt_affine_to_f32(in_ptr[3 * grid_len], zp, scale)) * 2.0;
box_x = (box_x + j) * (float)stride;
box_y = (box_y + i) * (float)stride;
box_w = box_w * box_w * (float)anchor[a * 2];
box_h = box_h * box_h * (float)anchor[a * 2 + 1];
box_x -= (box_w / 2.0);
box_y -= (box_h / 2.0);
int8_t maxClassProbs = in_ptr[5 * grid_len];
int maxClassId = 0;
for (int k = 1; k < OBJ_CLASS_NUM; ++k) {
int8_t prob = in_ptr[(5 + k) * grid_len];
if (prob > maxClassProbs) {
maxClassId = k;
maxClassProbs = prob;
}
}
if (maxClassProbs>thres_i8){
objProbs.push_back(sigmoid(deqnt_affine_to_f32(maxClassProbs, zp, scale))* sigmoid(deqnt_affine_to_f32(box_confidence, zp, scale)));
classId.push_back(maxClassId);
validCount++;
boxes.push_back(box_x);
boxes.push_back(box_y);
boxes.push_back(box_w);
boxes.push_back(box_h);
}
}
}
}
}
return validCount;
}
int PostProcessCalculator::post_process(int8_t* input0, int8_t* input1, int8_t* input2, int model_in_h, int model_in_w, float conf_threshold,
float nms_threshold, float scale_w, float scale_h, std::vector<int32_t>& qnt_zps,
std::vector<float>& qnt_scales, detect_result_group_t* group)
{
memset(group, 0, sizeof(detect_result_group_t));
std::vector<float> filterBoxes;
std::vector<float> objProbs;
std::vector<int> classId;
// stride 8
int stride0 = 8;
int grid_h0 = model_in_h / stride0;
int grid_w0 = model_in_w / stride0;
int validCount0 = 0;
validCount0 = process(input0, (int*)anchor0, grid_h0, grid_w0, model_in_h, model_in_w, stride0, filterBoxes, objProbs,
classId, conf_threshold, qnt_zps[0], qnt_scales[0]);
// stride 16
int stride1 = 16;
int grid_h1 = model_in_h / stride1;
int grid_w1 = model_in_w / stride1;
int validCount1 = 0;
validCount1 = process(input1, (int*)anchor1, grid_h1, grid_w1, model_in_h, model_in_w, stride1, filterBoxes, objProbs,
classId, conf_threshold, qnt_zps[1], qnt_scales[1]);
// stride 32
int stride2 = 32;
int grid_h2 = model_in_h / stride2;
int grid_w2 = model_in_w / stride2;
int validCount2 = 0;
validCount2 = process(input2, (int*)anchor2, grid_h2, grid_w2, model_in_h, model_in_w, stride2, filterBoxes, objProbs,
classId, conf_threshold, qnt_zps[2], qnt_scales[2]);
int validCount = validCount0 + validCount1 + validCount2;
// no object detect
if (validCount <= 0) {
return 0;
}
std::vector<int> indexArray;
for (int i = 0; i < validCount; ++i) {
indexArray.push_back(i);
}
quick_sort_indice_inverse(objProbs, 0, validCount - 1, indexArray);
std::set<int> class_set(std::begin(classId), std::end(classId));
for (auto c : class_set) {
nms(validCount, filterBoxes, classId, indexArray, c, nms_threshold);
}
int last_count = 0;
group->count = 0;
/* box valid detect target */
for (int i = 0; i < validCount; ++i) {
if (indexArray[i] == -1 || last_count >= OBJ_NUMB_MAX_SIZE) {
continue;
}
int n = indexArray[i];
float x1 = filterBoxes[n * 4 + 0];
float y1 = filterBoxes[n * 4 + 1];
float x2 = x1 + filterBoxes[n * 4 + 2];
float y2 = y1 + filterBoxes[n * 4 + 3];
int id = classId[n];
float obj_conf = objProbs[i];
group->results[last_count].box.left = (int)(clamp(x1, 0, model_in_w) / scale_w);
group->results[last_count].box.top = (int)(clamp(y1, 0, model_in_h) / scale_h);
group->results[last_count].box.right = (int)(clamp(x2, 0, model_in_w) / scale_w);
group->results[last_count].box.bottom = (int)(clamp(y2, 0, model_in_h) / scale_h);
group->results[last_count].prop = obj_conf;
char* label = labels_[id];
strncpy(group->results[last_count].name, label, OBJ_NAME_MAX_SIZE);
// printf("result %2d: (%4d, %4d, %4d, %4d), %s\n", i, group->results[last_count].box.left,
// group->results[last_count].box.top,
// group->results[last_count].box.right, group->results[last_count].box.bottom, label);
last_count++;
}
group->count = last_count;
return 0;
}
absl::Status PostProcessCalculator::Process(CalculatorContext* cc)
{
// get org mat img
const auto& input_frame = cc->Inputs().Tag(kImageFrameTag).Get<ImageFrame>();
cv::Mat input_mat = mediapipe::formats::MatView(&input_frame);
// cv::imwrite("./out.jpg", input_mat);
mediapipe::ImageFormat::Format input_format = input_frame.Format();
// // Allocate memory for the output image
// std::unique_ptr<mediapipe::ImageFrame> output_frame(
// new mediapipe::ImageFrame(input_format, input_mat.cols, input_mat.rows));
// cv::Mat output_mat = mediapipe::formats::MatView(output_frame.get());
// output_mat = input_mat.clone();
// get rknn outputs
const auto& rknn_output = cc->Inputs().Tag(kRknnOutputTag).Get<RknnOutputs>();
// post process
float height = (float)640;
float width = (float)640;
float scale_w = width / input_mat.cols;
float scale_h = height / input_mat.rows;
detect_result_group_t detect_result_group;
post_process((int8_t*)rknn_output.outputs[0].buf, (int8_t*)rknn_output.outputs[1].buf, (int8_t*)rknn_output.outputs[2].buf, height, width,
box_conf_threshold_, nms_threshold_, scale_w, scale_h, scale_zps_.out_zps, scale_zps_.out_scales, &detect_result_group);
// Draw Objects
char text[256];
for (int i = 0; i < detect_result_group.count; i++)
{
detect_result_t* det_result = &(detect_result_group.results[i]);
sprintf(text, "%s %.1f%%", det_result->name, det_result->prop * 100);
printf("%s @ (%d %d %d %d) %f\n", det_result->name, det_result->box.left, det_result->box.top,
det_result->box.right, det_result->box.bottom, det_result->prop);
int x1 = det_result->box.left;
int y1 = det_result->box.top;
int x2 = det_result->box.right;
int y2 = det_result->box.bottom;
cv::rectangle(input_mat, cv::Point(x1, y1), cv::Point(x2, y2), cv::Scalar(255, 0, 0, 255), 3);
cv::putText(input_mat, text, cv::Point(x1, y1 + 12), cv::FONT_HERSHEY_SIMPLEX, 0.6, cv::Scalar(0, 0, 255), 2);
}
// cc->Outputs().Tag(kImageFrameTag).Add(std::move(output_frame.release()), cc->InputTimestamp());
cc->Outputs().Tag(kImageFrameTag).AddPacket(MakePacket<ImageFrame>(std::move(const_cast<ImageFrame&>(input_frame))).At(cc->InputTimestamp()));
return absl::OkStatus();
}
void PostProcessCalculator::deinitPostProcess()
{
for (int i = 0; i < OBJ_CLASS_NUM; i++)
{
if (labels_[i] != nullptr) {
free(labels_[i]);
labels_[i] = nullptr;
}
}
}
absl::Status PostProcessCalculator::Close(CalculatorContext* cc)
{
deinitPostProcess();
return absl::OkStatus();
}
} // namespace mediapipe
为了编译器能编译上面的caculator,在mediapipe/calculators/rknn文件夹的BUILD文件,新增内容:
cc_library(
name = "post_process_calculator",
hdrs = ["//mediapipe/framework/formats/rknn:output.h"],
srcs = ["post_process_calculator.cc"],
copts = select({
"//mediapipe:ios": [
"-x objective-c++",
"-fobjc-arc", # enable reference-counting
],
"//conditions:default": [],
}),
deps = [
":post_process_calculator_cc_proto",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:packet",
"//mediapipe/framework:timestamp",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/formats:image_frame_opencv",
"//mediapipe/framework/formats:video_stream_header",
"//mediapipe/framework/port:opencv_core",
"//mediapipe/framework/port:opencv_imgproc",
"//mediapipe/framework/port:opencv_imgcodecs",
"//mediapipe/framework/port:logging",
"//mediapipe/framework/port:ret_check",
"//mediapipe/framework/port:status",
"//mediapipe/framework/stream_handler:fixed_size_input_stream_handler",
"@com_google_absl//absl/log:absl_check",
"@com_google_absl//absl/log:absl_log",
"@com_google_absl//absl/memory",
"@com_google_absl//absl/status",
"@com_google_absl//absl/strings",
],
alwayslink = 1,
)
2.3 新增graph
新增了所需的calculator后,还需要告诉编译器在编译可执行程序时需要依赖哪些calculator。新增mediapipe/mediapipe/graphs/rknn_yolov5/BUILD文件,并添加如下内容:
# Copyright 2019 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
load(
"//mediapipe/framework/tool:mediapipe_graph.bzl",
"mediapipe_binary_graph",
)
licenses(["notice"])
package(default_visibility = ["//visibility:public"])
cc_library(
name = "rknn_yolov5_calculators",
deps = [
"//mediapipe/calculators/core:concatenate_vector_calculator",
"//mediapipe/calculators/core:flow_limiter_calculator",
"//mediapipe/calculators/image:rga_calculator",
"//mediapipe/calculators/rknn:rknn_yolov5_calculator",
"//mediapipe/calculators/rknn:post_process_calculator",
"//mediapipe/calculators/core:previous_loopback_calculator",
"//mediapipe/calculators/core:split_vector_calculator",
"//mediapipe/calculators/video:opencv_video_decoder_calculator",
"//mediapipe/calculators/video:opencv_video_encoder_calculator",
],
)
三、 编译运行
新增文件夹mediapipe/mediapipe/examples/desktop/rknn_yolov5,在rknn_yolov5中新增文件BUILD,内容如下:
# Copyright 2019 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
licenses(["notice"])
package(default_visibility = ["//mediapipe/examples:__subpackages__"])
cc_binary(
name = "rknn_yolov5",
deps = [
"//mediapipe/examples/desktop:demo_run_graph_main",
"//mediapipe/graphs/rknn_yolov5:rknn_yolov5_calculators",
],
)
这样就可以编译可执行程序rknn_yolov5了
编译:
bazel-6.5.0-linux-arm64 build -c opt --copt=-g --define MEDIAPIPE_DISABLE_GPU=1 mediapipe/examples/desktop/rknn_yolov5:rknn_yolov5
运行:
bazel-bin/mediapipe/examples/desktop/rknn_yolov5/rknn_yolov5 --calculator_graph_config_file=mediapipe/graphs/rknn_yolov5/rknn_yolov5_desktop_live.pbtxt
结果如下:
总结
注:本文代码仅供参考流程,其中的代码规范,性能内存,隐藏bug等笔者并未投入修改,使用时还需注意。
总体来看,新增自定义graph和calculator还是比较简单的。这源于mediapipe框架的模块化设计,同时主体架构和业务代码分离,兼顾了易扩展的同时,性能也很优秀。
后续将会讲一些mediapipe框架的概念、流程、设计思想,比如graph如何构建、节点如何调度等,感兴趣的同学可以持续关注。
欲知后事如何,且听下回分解…
更多推荐


所有评论(0)