提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档

引言

上篇文章我们讲了如何在mediapipe中新增一个calculator和计算图,并给出了在rk3588平台上新增rga calculator和计算图的实例。这篇文章我们更近一步,继续在rk3588上新增一个目标检测计算图。

一、 定义数据流

废话不多说,直接从pbtxt的数据流定义开始,数据流图如下:
在这里插入图片描述
输入视频流经过FlowLimiter节流器,再送到RGA模块进行缩放,缩放到yolov5模型需要的640x640的尺寸,继续送到RknnYolov5进行npu推理,推理出的结果经过PostProcess后处理得到目标框、类别、加权置信度。最终基于这些数据在FlowLimiter输出的源图像帧上进行绘制,并叠加显示。
RknnYolov5处理完成后,会给FlowLimiter发送一个FINISHED信号,通知其可以继续送帧推理,RknnYolov5还会输出一个sidepacket静态数据,包含模型量化数据的缩放因子和零点,给PostProcess后处理使用。具体细节后文会展开。该计算图的proto配置文件如下:

# MediaPipe graph that performs object detection with yolov5 on rk3588
# Used in the examples in
# mediapipe/examples/desktop/rknn_yolov5:rknn_yolov5

# Images on CPU coming into and out of the graph.
input_stream: "input_video"
output_stream: "output_video"

# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for
# TfLiteTensorsToDetectionsCalculator downstream in the graph to finish
# generating the corresponding detections before it passes through another
# image. All images that come in while waiting are dropped, limiting the number
# of in-flight images between this calculator and
# TfLiteTensorsToDetectionsCalculator to 1. This prevents the nodes in between
# from queuing up incoming images and data excessively, which leads to increased
# latency and memory usage, unwanted in real-time mobile applications. It also
# eliminates unnecessarily computation, e.g., a transformed image produced by
# ImageTransformationCalculator may get dropped downstream if the subsequent
# TfLiteConverterCalculator or TfLiteInferenceCalculator is still busy
# processing previous inputs.
node {
  calculator: "FlowLimiterCalculator"
  input_stream: "input_video"
  input_stream: "FINISHED:rknnoutput"
  input_stream_info: {
    tag_index: "FINISHED"
    back_edge: true
  }
  output_stream: "throttled_input_video"
}

# Transforms the input image on RGA to a 640x640 image. 
node {
  calculator: "RgaCalculator"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "IMAGE:transformed_input_video"
  node_options: {
    [type.googleapis.com/mediapipe.RgaCalculatorOptions] {
      output_width: 640
      output_height: 640
    }
  }
}


# Runs a rknn  model on npu
node {
  calculator: "RknnYolov5Calculator"
  input_stream: "IMAGE:transformed_input_video"
  output_side_packet: "SCALEZPS:scalezps"
  output_stream: "RKNNOUTPUT:rknnoutput"
  node_options: {
    [type.googleapis.com/mediapipe.RknnYolov5CalculatorOptions] {
      model_path: "mediapipe/models/yolov5s-640-640.rknn"
    }
  }
}

# Performs non-max suppression to remove excessive rknnoutput.
node {
  calculator: "PostProcessCalculator"
  input_side_packet: "SCALEZPS:scalezps"
  input_stream: "RKNNOUTPUT:rknnoutput"
  input_stream: "IMAGE:throttled_input_video"
  output_stream: "IMAGE:output_video"
  node_options: {
    [type.googleapis.com/mediapipe.PostProcessCalculatorOptions] {
      box_conf_threshold: 0.45
      nms_threshold: 0.25
      label_map_path: "mediapipe/models/coco_80_labels_list.txt"
    }
  }
}

二、 新增caculator和graph

2.1 新增RknnYolov5

2.1.1 配置文件

mediapipe/calculators/rknn文件夹中新增rknn_yolov5_calculator.proto文件,该文件是为了导出模型路径配置,内容如下:

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

syntax = "proto2";

package mediapipe;

import "mediapipe/framework/calculator.proto";

message RknnYolov5CalculatorOptions {

  // Path to the rknn yolo model
  optional string model_path = 1;
}

mediapipe/calculators/rknn文件夹中新建一个BUILD文件,新增内容:

mediapipe_proto_library(
    name = "rknn_yolov5_calculator_proto",
    srcs = ["rknn_yolov5_calculator.proto"],
    deps = [
        "//mediapipe/framework:calculator_options_proto",
        "//mediapipe/framework:calculator_proto",
    ],
)

mediapipe/framework/tool/mediapipe_proto_allowlist.bzl文件rewrite_target_list中新增

	"rknn_yolov5_calculator_proto",

添加完成,后续编译后,就可以代码中使用模型路径字段了。

2.1.2 头文件定义

因为本文使用RK3588平台的yolo进行推理,因此需要自定义一些平台相关的数据结构,以便两个calculator之间传输数据。
mediapipe/framework/formats/rknn中新增output.h文件,内容如下:

// Copyright 2020 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#ifndef MEDIAPIPE_FRAMEWORK_FORMATS_RKNN_H_
#define MEDIAPIPE_FRAMEWORK_FORMATS_RKNN_H_

#include <algorithm>
#include <cstdint>
#include <functional>
#include <initializer_list>
#include <memory>
#include <numeric>
#include <tuple>
#include <type_traits>
#include <utility>
#include <vector>

#include "rknn_api.h"

namespace mediapipe {

#define MAX_RKNN_OUTPUT_NUM 5
#define OBJ_NAME_MAX_SIZE 16
#define OBJ_NUMB_MAX_SIZE 64
#define OBJ_CLASS_NUM     80
#define PROP_BOX_SIZE     (5+OBJ_CLASS_NUM)

typedef struct _RknnOutputs 
{
    uint32_t num;                                      /* the num of outputs*/
    rknn_output outputs[MAX_RKNN_OUTPUT_NUM];

} RknnOutputs;  

typedef struct _ScaleZps
{
    std::vector<float>    out_scales;
    std::vector<int32_t>  out_zps;
} ScaleZps;  

typedef struct _BOX_RECT
{
    int left;
    int right;
    int top;
    int bottom;
} BOX_RECT;

typedef struct __detect_result_t
{
    char name[OBJ_NAME_MAX_SIZE];
    BOX_RECT box;
    float prop;
} detect_result_t;

typedef struct _detect_result_group_t
{
    int id;
    int count;
    detect_result_t results[OBJ_NUMB_MAX_SIZE];
} detect_result_group_t;


}  // namespace mediapipe

#endif  // MEDIAPIPE_FRAMEWORK_FORMATS_RKNN_H_

同时在mediapipe/framework/formats/rknn中新增BUILD文件,导出该头文件,内容如下:

# Copyright 2019 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

package(
    default_visibility = ["//visibility:private"],
    features = ["-layering_check"],
)

licenses(["notice"])

exports_files([
    "output.h",
])

这样就可以在两个calculator之间使用头文件中自定义的数据结构了。

2.1.3 calculator实现

calculator的实现依然是继承CalculatorBase类,并重写GetContractOpenProcessClose方法。在GetContract中检查并定义输入输出数据结构;Open方法用来加载yolov5模型,初始化rknn的运行环境,并dump打印模型的一些信息,最后将模型的量化数据缩放因子和零点作为side packets输出到下一个节点;Process方法中对每个输入图像在npu上进行推理,然后将推理结果输出到RknnOutputs结构体中,并将其送到下个节点。Close方法用来释放加载的模型数据和rknn环境。具体代码如下:

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include "absl/status/status.h"
#include "mediapipe/calculators/rknn/rknn_yolov5_calculator.pb.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/formats/video_stream_header.h"
#include "mediapipe/framework/formats/rknn/output.h"
#include "mediapipe/framework/packet.h"
#include "mediapipe/framework/port/opencv_core_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/status.h"
#include "mediapipe/framework/timestamp.h"

#include "rknn_api.h"

typedef int DimensionsPacketType[2];

#define DEFAULT_SCALE_MODE mediapipe::ScaleMode_Mode_STRETCH

namespace mediapipe {

namespace {
constexpr char kImageFrameTag[] = "IMAGE";
constexpr char kRknnOutputTag[] = "RKNNOUTPUT";
constexpr char kScaleZpsTag[] = "SCALEZPS";

}  // namespace

class RknnYolov5Calculator : public CalculatorBase {
 public:
  RknnYolov5Calculator() = default;
  ~RknnYolov5Calculator() override = default;

  static absl::Status GetContract(CalculatorContract* cc);

  absl::Status Open(CalculatorContext* cc) override;
  absl::Status Process(CalculatorContext* cc) override;
  absl::Status Close(CalculatorContext* cc) override;

 private:

  ::mediapipe::RknnYolov5CalculatorOptions options_;

  // char *model_path_ = nullptr;

  rknn_context   ctx_;
  unsigned char* model_data_ = nullptr;  
  rknn_input_output_num io_num_;

  int input_channel_ = 3;
  int input_width_   = 0;
  int input_height_  = 0;

  unsigned char* load_data(FILE* fp, size_t ofst, size_t sz);
  unsigned char* load_model(const char* filename, int* model_size);

  void dump_tensor_attr(rknn_tensor_attr* attr);

};
REGISTER_CALCULATOR(RknnYolov5Calculator);

// static
absl::Status RknnYolov5Calculator::GetContract(CalculatorContract* cc) 
{

  const auto& options = cc->Options<::mediapipe::RknnYolov5CalculatorOptions>();

  RET_CHECK(!options.model_path().empty())
      << "Either model as side packet or model path in options is required.";

  // Side packets
  cc->OutputSidePackets().Tag(kScaleZpsTag).Set<ScaleZps>();

  // Only one input can be set, and the output type must match.
  RET_CHECK(cc->Inputs().HasTag(kImageFrameTag));

  if (cc->Inputs().HasTag(kImageFrameTag)) 
  {
    RET_CHECK(cc->Outputs().HasTag(kRknnOutputTag));
    cc->Inputs().Tag(kImageFrameTag).Set<ImageFrame>();
    cc->Outputs().Tag(kRknnOutputTag).Set<RknnOutputs>();
  }

  // Assign this calculator's default InputStreamHandler.
  cc->SetInputStreamHandler("FixedSizeInputStreamHandler");

  return absl::OkStatus();
}

void RknnYolov5Calculator::dump_tensor_attr(rknn_tensor_attr* attr)
{
  std::string shape_str = attr->n_dims < 1 ? "" : std::to_string(attr->dims[0]);
  for (int i = 1; i < attr->n_dims; ++i) {
    shape_str += ", " + std::to_string(attr->dims[i]);
  }

  printf("  index=%d, name=%s, n_dims=%d, dims=[%s], n_elems=%d, size=%d, w_stride = %d, size_with_stride=%d, fmt=%s, "
         "type=%s, qnt_type=%s, "
         "zp=%d, scale=%f\n",
         attr->index, attr->name, attr->n_dims, shape_str.c_str(), attr->n_elems, attr->size, attr->w_stride,
         attr->size_with_stride, get_format_string(attr->fmt), get_type_string(attr->type),
         get_qnt_type_string(attr->qnt_type), attr->zp, attr->scale);
}


unsigned char* RknnYolov5Calculator::load_data(FILE* fp, size_t ofst, size_t sz)
{
  unsigned char* data;
  int            ret;

  data = NULL;

  if (NULL == fp) {
    return NULL;
  }

  ret = fseek(fp, ofst, SEEK_SET);
  if (ret != 0) {
    printf("blob seek failure.\n");
    return NULL;
  }

  data = (unsigned char*)malloc(sz);
  if (data == NULL) {
    printf("buffer malloc failure.\n");
    return NULL;
  }
  ret = fread(data, 1, sz, fp);
  return data;
}
unsigned char* RknnYolov5Calculator::load_model(const char* filename, int* model_size)
{
  FILE*          fp;
  unsigned char* data;

  fp = fopen(filename, "rb");
  if (NULL == fp) 
  {
    printf("Open file %s failed.\n", filename);
    return NULL;
  }

  fseek(fp, 0, SEEK_END);
  int size = ftell(fp);

  data = load_data(fp, 0, size);

  fclose(fp);

  *model_size = size;
  return data;
}

absl::Status RknnYolov5Calculator::Open(CalculatorContext* cc) 
{
  options_ = cc->Options<::mediapipe::RknnYolov5CalculatorOptions>();
  // model_path_ = options_.model_path().c_str();

  // Create the neural network
  int ret = 0;
  ABSL_LOG(INFO) << "Loading mode..."; 
  int            model_data_size = 0;
  model_data_      = load_model(options_.model_path().c_str(), &model_data_size);
  ret              = rknn_init(&ctx_, model_data_, model_data_size, 0, NULL);
  if (ret < 0) 
  {
    printf("rknn_init error ret=: %d\n", ret);
    return absl::UnavailableError("rknn_init error");
  }

  rknn_sdk_version version;
  ret = rknn_query(ctx_, RKNN_QUERY_SDK_VERSION, &version, sizeof(rknn_sdk_version));
  if (ret < 0) 
  {
    printf("rknn_init error ret=: %d\n", ret);
    return absl::UnavailableError("rknn_init error");
  }
  printf("sdk version: %s driver version: %s\n", version.api_version, version.drv_version);

  ret = rknn_query(ctx_, RKNN_QUERY_IN_OUT_NUM, &io_num_, sizeof(io_num_));
  if (ret < 0) 
  {
    printf("rknn_query error ret=%d\n", ret);
    return absl::UnavailableError("rknn_init error");
  }
  printf("model input num: %d, output num: %d\n", io_num_.n_input, io_num_.n_output);

  rknn_tensor_attr input_attrs[io_num_.n_input];
  memset(input_attrs, 0, sizeof(input_attrs));
  for (int i = 0; i < io_num_.n_input; i++) 
  {
    input_attrs[i].index = i;
    ret                  = rknn_query(ctx_, RKNN_QUERY_INPUT_ATTR, &(input_attrs[i]), sizeof(rknn_tensor_attr));
    if (ret < 0) {
      printf("rknn_init error ret=%d\n", ret);
      return absl::UnavailableError("rknn_init error");
    }
    dump_tensor_attr(&(input_attrs[i]));
  }

  rknn_tensor_attr output_attrs[io_num_.n_output];
  memset(output_attrs, 0, sizeof(output_attrs));
  for (int i = 0; i < io_num_.n_output; i++) 
  {
    output_attrs[i].index = i;
    ret                   = rknn_query(ctx_, RKNN_QUERY_OUTPUT_ATTR, &(output_attrs[i]), sizeof(rknn_tensor_attr));
    dump_tensor_attr(&(output_attrs[i]));
  }

  if (input_attrs[0].fmt == RKNN_TENSOR_NCHW) 
  {
    printf("model is NCHW input fmt\n");
    input_channel_ = input_attrs[0].dims[1];
    input_height_  = input_attrs[0].dims[2];
    input_width_   = input_attrs[0].dims[3];
  } 
  else 
  {
    printf("model is NHWC input fmt\n");
    input_height_  = input_attrs[0].dims[1];
    input_width_   = input_attrs[0].dims[2];
    input_channel_ = input_attrs[0].dims[3];
  }

  printf("model input height=%d, width=%d, channel=%d\n", input_height_, input_width_, input_channel_);

  // Pass side packets
  ScaleZps scale_zps;
  for (int i = 0; i < io_num_.n_output; ++i) 
  {
    scale_zps.out_scales.push_back(output_attrs[i].scale);
    scale_zps.out_zps.push_back(output_attrs[i].zp);
  }  
  cc->OutputSidePackets().Tag(kScaleZpsTag).Set(MakePacket<ScaleZps>(scale_zps));

  return absl::OkStatus();
}

absl::Status RknnYolov5Calculator::Process(CalculatorContext* cc) 
{

  // Convert ImageFrame to OpenCV Mat
  const auto& input_frame = cc->Inputs().Tag(kImageFrameTag).Get<ImageFrame>();
  cv::Mat input_mat = mediapipe::formats::MatView(&input_frame);

  int img_chns   = input_mat.channels();
  int img_width  = input_mat.cols;
  int img_height = input_mat.rows;
  if (img_width != input_width_ || img_height != input_height_ || img_chns != input_channel_) 
  {
    return absl::UnavailableError("Image size is not correct");
  }

  // Set rknn input and output
  rknn_input inputs[1];
  memset(inputs, 0, sizeof(inputs));
  inputs[0].index        = 0;
  inputs[0].type         = RKNN_TENSOR_UINT8;
  inputs[0].size         = input_width_ * input_height_ * input_channel_;
  inputs[0].fmt          = RKNN_TENSOR_NHWC;
  inputs[0].pass_through = 0;

  // Check if input_mat is valid
  if (input_mat.empty() || !input_mat.isContinuous()) 
  {
    return absl::InternalError("Invalid input image data");
  }
  inputs[0].buf = (void*)input_mat.data;

  // Run the rknn
  int ret = rknn_inputs_set(ctx_, io_num_.n_input, inputs);
  if (ret < 0) 
  {
    return absl::InternalError("Failed to set rknn inputs");
  }

  rknn_output outputs[io_num_.n_output];
  memset(outputs, 0, sizeof(outputs));
  for (int i = 0; i < io_num_.n_output; i++) 
  {
    outputs[i].want_float = 0;
  }

  ret = rknn_run(ctx_, NULL);
  if (ret < 0) 
  {
    return absl::InternalError("Failed to run rknn");
  }

  ret = rknn_outputs_get(ctx_, io_num_.n_output, outputs, nullptr);
  if (ret < 0) 
  {
    return absl::InternalError("Failed to get rknn outputs");
  }


  // Output result
  std::unique_ptr<mediapipe::RknnOutputs> rknn_outputs(new mediapipe::RknnOutputs());

  rknn_outputs->num = io_num_.n_output;
  for (int i = 0; i < io_num_.n_output; i++) 
  {
    if (!outputs[i].buf || outputs[i].size <= 0) 
    {
      return absl::InternalError("Invalid output buffer or size");
    }

    memcpy(&rknn_outputs->outputs[i], &outputs[i], sizeof(rknn_output));

    rknn_outputs->outputs[i].buf = calloc(1, outputs[i].size);
    memcpy(rknn_outputs->outputs[i].buf, outputs[i].buf, outputs[i].size);
  }

  cc->Outputs().Tag(kRknnOutputTag).Add(rknn_outputs.release(), cc->InputTimestamp());

  // Release rknn outputs
  rknn_outputs_release(ctx_, io_num_.n_output, outputs);

  return absl::OkStatus();
}

absl::Status RknnYolov5Calculator::Close(CalculatorContext* cc) 
{

  // release
  int ret = rknn_destroy(ctx_);
  if (ret < 0) 
  {
    return absl::UnknownError("Failed to destroy RKNN context");
  }

  if (model_data_) 
  {
    free(model_data_);
    model_data_ = nullptr;
  }

  return absl::OkStatus();
}

}  // namespace mediapipe

为了编译器能编译上面的caculator,在mediapipe/calculators/rknn文件夹的BUILD文件,新增内容:

cc_library(
    name = "rknn_yolov5_calculator",
    hdrs = ["//mediapipe/framework/formats/rknn:output.h"],
    srcs = ["rknn_yolov5_calculator.cc"],
    copts = select({
        "//mediapipe:ios": [
            "-x objective-c++",
            "-fobjc-arc",  # enable reference-counting
        ],
        "//conditions:default": [],
    }),
    deps = [
        ":rknn_yolov5_calculator_cc_proto",
        "//mediapipe/framework:calculator_framework",
        "//mediapipe/framework:packet",
        "//mediapipe/framework:timestamp",
        "//mediapipe/framework/formats:image_frame",
        "//mediapipe/framework/formats:image_frame_opencv",
        "//mediapipe/framework/formats:video_stream_header",
        "//mediapipe/framework/port:opencv_core",
        "//mediapipe/framework/port:opencv_imgproc",
        "//mediapipe/framework/port:logging",
        "//mediapipe/framework/port:ret_check",
        "//mediapipe/framework/port:status",
        "//mediapipe/framework/stream_handler:fixed_size_input_stream_handler",
        "@com_google_absl//absl/log:absl_check",
        "@com_google_absl//absl/log:absl_log",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/status",
        "@com_google_absl//absl/strings",
    ],
    alwayslink = 1,
)

2.2. 新增PostProcess

PostProcess节点用来将rknnyolov5的输出解码成目标框、置信度、类别,然后在源图像中进行叠加显示,新增的步骤和2.1 类似。

2.2.1 配置文件

mediapipe/calculators/rknn文件夹中新增rknn_yolov5_calculator.proto文件,内容如下:

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

syntax = "proto2";

package mediapipe;

import "mediapipe/framework/calculator.proto";

message PostProcessCalculatorOptions {

  // Path to the rknn yolo model
  optional string label_map_path = 1;

  optional float box_conf_threshold = 2 [default = -1.0];

  optional float nms_threshold = 3 [default = -1.0];

}

mediapipe/calculators/rknn文件夹的BUILD文件,新增内容:

mediapipe_proto_library(
    name = "post_process_calculator_proto",
    srcs = ["post_process_calculator.proto"],
    deps = [
        "//mediapipe/framework:calculator_options_proto",
        "//mediapipe/framework:calculator_proto",
    ],
)

2.2.2 caculator实现

在Open方法中加载labeltxt文件,和rknnyolov5节点传下来的sidepackets配置(模型量化参数缩放因子和零点)。在Process方法中,基于sidepackets配置、RknnOutputs,在FlowLimiter传下来的源图像帧中绘制后处理结果。代码如下:

// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include "absl/status/status.h"
#include "mediapipe/calculators/rknn/post_process_calculator.pb.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/formats/video_stream_header.h"
#include "mediapipe/framework/formats/rknn/output.h"
#include "mediapipe/framework/packet.h"
#include "mediapipe/framework/port/opencv_core_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/opencv_imgcodecs_inc.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/status.h"
#include "mediapipe/framework/timestamp.h"
#include "opencv2/imgcodecs.hpp"

#include "rknn_api.h"

namespace mediapipe {

namespace {
constexpr char kRknnOutputTag[] = "RKNNOUTPUT";
// constexpr char kDetectResult[] = "DETECTRESULT";
constexpr char kScaleZpsTag[] = "SCALEZPS";
constexpr char kImageFrameTag[] = "IMAGE";

}  // namespace

class PostProcessCalculator : public CalculatorBase {
 public:
  PostProcessCalculator() = default;
  ~PostProcessCalculator() override = default;

  static absl::Status GetContract(CalculatorContract* cc);

  absl::Status Open(CalculatorContext* cc) override;
  absl::Status Process(CalculatorContext* cc) override;
  absl::Status Close(CalculatorContext* cc) override;

 private:

  ::mediapipe::PostProcessCalculatorOptions options_;

  char* labels_[OBJ_CLASS_NUM];
  const int anchor0[6] = {10, 13, 16, 30, 33, 23};
  const int anchor1[6] = {30, 61, 62, 45, 59, 119};
  const int anchor2[6] = {116, 90, 156, 198, 373, 326};

  ScaleZps scale_zps_;

  float box_conf_threshold_; 
  float nms_threshold_;

  char* ReadLine(FILE* fp, char* buffer, int* len);
  int ReadLines(const char* fileName, char* lines[], int max_line);
  int LoadLabelName(const char* locationFilename, char* label[]);
  inline int clamp(float val, int min, int max) { return val > min ? (val < max ? val : max) : min; }
  float CalculateOverlap(float xmin0, float ymin0, float xmax0, float ymax0, float xmin1, float ymin1, float xmax1,
                              float ymax1);  
  int nms(int validCount, std::vector<float>& outputLocations, std::vector<int> classIds, std::vector<int>& order,
               int filterId, float threshold);
  int quick_sort_indice_inverse(std::vector<float>& input, int left, int right, std::vector<int>& indices);
  inline float sigmoid(float x) { return 1.0 / (1.0 + expf(-x)); }
  inline float unsigmoid(float y) { return -1.0 * logf((1.0 / y) - 1.0); }

  inline int32_t __clip(float val, float min, float max)
  {
    float f = val <= min ? min : (val >= max ? max : val);
    return f;
  }

  int8_t qnt_f32_to_affine(float f32, int32_t zp, float scale);
  float deqnt_affine_to_f32(int8_t qnt, int32_t zp, float scale) { return ((float)qnt - (float)zp) * scale; }

  int process(int8_t* input, int* anchor, int grid_h, int grid_w, int height, int width, int stride,
                    std::vector<float>& boxes, std::vector<float>& objProbs, std::vector<int>& classId, float threshold,
                    int32_t zp, float scale);

  int post_process(int8_t* input0, int8_t* input1, int8_t* input2, int model_in_h, int model_in_w, float conf_threshold,
                  float nms_threshold, float scale_w, float scale_h, std::vector<int32_t>& qnt_zps,
                  std::vector<float>& qnt_scales, detect_result_group_t* group);

  void deinitPostProcess();

};
REGISTER_CALCULATOR(PostProcessCalculator);

// static
absl::Status PostProcessCalculator::GetContract(CalculatorContract* cc) 
{

  const auto& options = cc->Options<::mediapipe::PostProcessCalculatorOptions>();

  RET_CHECK(!options.label_map_path().empty())
      << "Either model as side packet or model path in options is required.";

  // Side packets.
  cc->InputSidePackets().Tag(kScaleZpsTag).Set<ScaleZps>();

  if (cc->Inputs().HasTag(kRknnOutputTag)) 
  {
    // RET_CHECK(cc->Outputs().HasTag(kDetectResult));
    cc->Inputs().Tag(kRknnOutputTag).Set<RknnOutputs>();
    cc->Inputs().Tag(kImageFrameTag).Set<ImageFrame>();
    // cc->Outputs().Tag(kDetectResult).Set<detect_result_group_t>();
    cc->Outputs().Tag(kImageFrameTag).Set<ImageFrame>();

  }

  return absl::OkStatus();
}

char* PostProcessCalculator::ReadLine(FILE* fp, char* buffer, int* len)
{
  int    ch;
  int    i        = 0;
  size_t buff_len = 0;

  buffer = (char*)malloc(buff_len + 1);
  if (!buffer)
    return NULL; // Out of memory

  while ((ch = fgetc(fp)) != '\n' && ch != EOF) {
    buff_len++;
    void* tmp = realloc(buffer, buff_len + 1);
    if (tmp == NULL) {
      free(buffer);
      return NULL; // Out of memory
    }
    buffer = (char*)tmp;

    buffer[i] = (char)ch;
    i++;
  }
  buffer[i] = '\0';

  *len = buff_len;

  // Detect end
  if (ch == EOF && (i == 0 || ferror(fp))) {
    free(buffer);
    return NULL;
  }
  return buffer;
}

int PostProcessCalculator::ReadLines(const char* fileName, char* lines[], int max_line)
{
  FILE* file = fopen(fileName, "r");
  char* s;
  int   i = 0;
  int   n = 0;

  if (file == NULL) {
    printf("Open %s fail!\n", fileName);
    return -1;
  }

  while ((s = ReadLine(file, s, &n)) != NULL) {
    lines[i++] = s;
    if (i >= max_line)
      break;
  }
  fclose(file);
  return i;
}

int PostProcessCalculator::LoadLabelName(const char* locationFilename, char* label[])
{
  printf("loadLabelName %s\n", locationFilename);
  ReadLines(locationFilename, label, OBJ_CLASS_NUM);
  return 0;
}

absl::Status PostProcessCalculator::Open(CalculatorContext* cc) 
{

  options_ = cc->Options<::mediapipe::PostProcessCalculatorOptions>();

  // threshold
  box_conf_threshold_ = options_.box_conf_threshold();
  nms_threshold_ = options_.nms_threshold();

  // label name
  int ret = 0;
  ret     = LoadLabelName(options_.label_map_path().c_str(), labels_);
  if (ret < 0) 
  {
    printf("load label name error ret=: %d\n", ret);
    return absl::UnavailableError("load label name error");
  }

  // side packets.
  scale_zps_ = cc->InputSidePackets().Tag(kScaleZpsTag).Get<ScaleZps>();

  return absl::OkStatus();
}

float PostProcessCalculator::CalculateOverlap(float xmin0, float ymin0, float xmax0, float ymax0, float xmin1, float ymin1, float xmax1,
                              float ymax1)
{
  float w = fmax(0.f, fmin(xmax0, xmax1) - fmax(xmin0, xmin1) + 1.0);
  float h = fmax(0.f, fmin(ymax0, ymax1) - fmax(ymin0, ymin1) + 1.0);
  float i = w * h;
  float u = (xmax0 - xmin0 + 1.0) * (ymax0 - ymin0 + 1.0) + (xmax1 - xmin1 + 1.0) * (ymax1 - ymin1 + 1.0) - i;
  return u <= 0.f ? 0.f : (i / u);
}

int PostProcessCalculator::nms(int validCount, std::vector<float>& outputLocations, std::vector<int> classIds, std::vector<int>& order,
               int filterId, float threshold)
{
  for (int i = 0; i < validCount; ++i) {
    if (order[i] == -1 || classIds[i] != filterId) {
      continue;
    }
    int n = order[i];
    for (int j = i + 1; j < validCount; ++j) {
      int m = order[j];
      if (m == -1 || classIds[i] != filterId) {
        continue;
      }
      float xmin0 = outputLocations[n * 4 + 0];
      float ymin0 = outputLocations[n * 4 + 1];
      float xmax0 = outputLocations[n * 4 + 0] + outputLocations[n * 4 + 2];
      float ymax0 = outputLocations[n * 4 + 1] + outputLocations[n * 4 + 3];

      float xmin1 = outputLocations[m * 4 + 0];
      float ymin1 = outputLocations[m * 4 + 1];
      float xmax1 = outputLocations[m * 4 + 0] + outputLocations[m * 4 + 2];
      float ymax1 = outputLocations[m * 4 + 1] + outputLocations[m * 4 + 3];

      float iou = CalculateOverlap(xmin0, ymin0, xmax0, ymax0, xmin1, ymin1, xmax1, ymax1);

      if (iou > threshold) {
        order[j] = -1;
      }
    }
  }
  return 0;
}

int PostProcessCalculator::quick_sort_indice_inverse(std::vector<float>& input, int left, int right, std::vector<int>& indices)
{
  float key;
  int   key_index;
  int   low  = left;
  int   high = right;
  if (left < right) {
    key_index = indices[left];
    key       = input[left];
    while (low < high) {
      while (low < high && input[high] <= key) {
        high--;
      }
      input[low]   = input[high];
      indices[low] = indices[high];
      while (low < high && input[low] >= key) {
        low++;
      }
      input[high]   = input[low];
      indices[high] = indices[low];
    }
    input[low]   = key;
    indices[low] = key_index;
    quick_sort_indice_inverse(input, left, low - 1, indices);
    quick_sort_indice_inverse(input, low + 1, right, indices);
  }
  return low;
}

int8_t PostProcessCalculator::qnt_f32_to_affine(float f32, int32_t zp, float scale)
{
  float  dst_val = (f32 / scale) + zp;
  int8_t res     = (int8_t)__clip(dst_val, -128, 127);
  return res;
}

int PostProcessCalculator::process(int8_t* input, int* anchor, int grid_h, int grid_w, int height, int width, int stride,
                   std::vector<float>& boxes, std::vector<float>& objProbs, std::vector<int>& classId, float threshold,
                   int32_t zp, float scale)
{
  int    validCount = 0;
  int    grid_len   = grid_h * grid_w;
  float  thres      = unsigmoid(threshold);
  int8_t thres_i8   = qnt_f32_to_affine(thres, zp, scale);
  for (int a = 0; a < 3; a++) {
    for (int i = 0; i < grid_h; i++) {
      for (int j = 0; j < grid_w; j++) {
        int8_t box_confidence = input[(PROP_BOX_SIZE * a + 4) * grid_len + i * grid_w + j];
        if (box_confidence >= thres_i8) {
          int     offset = (PROP_BOX_SIZE * a) * grid_len + i * grid_w + j;
          int8_t* in_ptr = input + offset;
          float   box_x  = sigmoid(deqnt_affine_to_f32(*in_ptr, zp, scale)) * 2.0 - 0.5;
          float   box_y  = sigmoid(deqnt_affine_to_f32(in_ptr[grid_len], zp, scale)) * 2.0 - 0.5;
          float   box_w  = sigmoid(deqnt_affine_to_f32(in_ptr[2 * grid_len], zp, scale)) * 2.0;
          float   box_h  = sigmoid(deqnt_affine_to_f32(in_ptr[3 * grid_len], zp, scale)) * 2.0;
          box_x          = (box_x + j) * (float)stride;
          box_y          = (box_y + i) * (float)stride;
          box_w          = box_w * box_w * (float)anchor[a * 2];
          box_h          = box_h * box_h * (float)anchor[a * 2 + 1];
          box_x -= (box_w / 2.0);
          box_y -= (box_h / 2.0);

          int8_t maxClassProbs = in_ptr[5 * grid_len];
          int    maxClassId    = 0;
          for (int k = 1; k < OBJ_CLASS_NUM; ++k) {
            int8_t prob = in_ptr[(5 + k) * grid_len];
            if (prob > maxClassProbs) {
              maxClassId    = k;
              maxClassProbs = prob;
            }
          }
          if (maxClassProbs>thres_i8){
            objProbs.push_back(sigmoid(deqnt_affine_to_f32(maxClassProbs, zp, scale))* sigmoid(deqnt_affine_to_f32(box_confidence, zp, scale)));
            classId.push_back(maxClassId);
            validCount++;
            boxes.push_back(box_x);
            boxes.push_back(box_y);
            boxes.push_back(box_w);
            boxes.push_back(box_h);
          }
        }
      }
    }
  }
  return validCount;
}

int PostProcessCalculator::post_process(int8_t* input0, int8_t* input1, int8_t* input2, int model_in_h, int model_in_w, float conf_threshold,
                 float nms_threshold, float scale_w, float scale_h, std::vector<int32_t>& qnt_zps,
                 std::vector<float>& qnt_scales, detect_result_group_t* group)
{
  
  memset(group, 0, sizeof(detect_result_group_t));

  std::vector<float> filterBoxes;
  std::vector<float> objProbs;
  std::vector<int>   classId;

  // stride 8
  int stride0     = 8;
  int grid_h0     = model_in_h / stride0;
  int grid_w0     = model_in_w / stride0;
  int validCount0 = 0;
  validCount0 = process(input0, (int*)anchor0, grid_h0, grid_w0, model_in_h, model_in_w, stride0, filterBoxes, objProbs,
                        classId, conf_threshold, qnt_zps[0], qnt_scales[0]);

  // stride 16
  int stride1     = 16;
  int grid_h1     = model_in_h / stride1;
  int grid_w1     = model_in_w / stride1;
  int validCount1 = 0;
  validCount1 = process(input1, (int*)anchor1, grid_h1, grid_w1, model_in_h, model_in_w, stride1, filterBoxes, objProbs,
                        classId, conf_threshold, qnt_zps[1], qnt_scales[1]);

  // stride 32
  int stride2     = 32;
  int grid_h2     = model_in_h / stride2;
  int grid_w2     = model_in_w / stride2;
  int validCount2 = 0;
  validCount2 = process(input2, (int*)anchor2, grid_h2, grid_w2, model_in_h, model_in_w, stride2, filterBoxes, objProbs,
                        classId, conf_threshold, qnt_zps[2], qnt_scales[2]);

  int validCount = validCount0 + validCount1 + validCount2;
  // no object detect
  if (validCount <= 0) {
    return 0;
  }

  std::vector<int> indexArray;
  for (int i = 0; i < validCount; ++i) {
    indexArray.push_back(i);
  }

  quick_sort_indice_inverse(objProbs, 0, validCount - 1, indexArray);

  std::set<int> class_set(std::begin(classId), std::end(classId));

  for (auto c : class_set) {
    nms(validCount, filterBoxes, classId, indexArray, c, nms_threshold);
  }

  int last_count = 0;
  group->count   = 0;
  /* box valid detect target */
  for (int i = 0; i < validCount; ++i) {
    if (indexArray[i] == -1 || last_count >= OBJ_NUMB_MAX_SIZE) {
      continue;
    }
    int n = indexArray[i];

    float x1       = filterBoxes[n * 4 + 0];
    float y1       = filterBoxes[n * 4 + 1];
    float x2       = x1 + filterBoxes[n * 4 + 2];
    float y2       = y1 + filterBoxes[n * 4 + 3];
    int   id       = classId[n];
    float obj_conf = objProbs[i];

    group->results[last_count].box.left   = (int)(clamp(x1, 0, model_in_w) / scale_w);
    group->results[last_count].box.top    = (int)(clamp(y1, 0, model_in_h) / scale_h);
    group->results[last_count].box.right  = (int)(clamp(x2, 0, model_in_w) / scale_w);
    group->results[last_count].box.bottom = (int)(clamp(y2, 0, model_in_h) / scale_h);
    group->results[last_count].prop       = obj_conf;
    char* label                           = labels_[id];
    strncpy(group->results[last_count].name, label, OBJ_NAME_MAX_SIZE);

    // printf("result %2d: (%4d, %4d, %4d, %4d), %s\n", i, group->results[last_count].box.left,
    // group->results[last_count].box.top,
    //        group->results[last_count].box.right, group->results[last_count].box.bottom, label);
    last_count++;
  }
  group->count = last_count;

  return 0;
}


absl::Status PostProcessCalculator::Process(CalculatorContext* cc) 
{

  // get org mat img
  const auto& input_frame = cc->Inputs().Tag(kImageFrameTag).Get<ImageFrame>();
  cv::Mat input_mat = mediapipe::formats::MatView(&input_frame);
  // cv::imwrite("./out.jpg", input_mat);
  mediapipe::ImageFormat::Format input_format = input_frame.Format();

  // // Allocate memory for the output image
  // std::unique_ptr<mediapipe::ImageFrame> output_frame(
  //     new mediapipe::ImageFrame(input_format, input_mat.cols, input_mat.rows));
  // cv::Mat output_mat = mediapipe::formats::MatView(output_frame.get());
  // output_mat = input_mat.clone();

  // get rknn outputs
  const auto& rknn_output = cc->Inputs().Tag(kRknnOutputTag).Get<RknnOutputs>();

  // post process
  float height = (float)640;
  float width = (float)640;
  float scale_w = width / input_mat.cols;
  float scale_h = height / input_mat.rows;

  detect_result_group_t detect_result_group;

  post_process((int8_t*)rknn_output.outputs[0].buf, (int8_t*)rknn_output.outputs[1].buf, (int8_t*)rknn_output.outputs[2].buf, height, width,
               box_conf_threshold_, nms_threshold_, scale_w, scale_h, scale_zps_.out_zps, scale_zps_.out_scales, &detect_result_group);

  // Draw Objects
  char text[256];
  for (int i = 0; i < detect_result_group.count; i++) 
  {
    detect_result_t* det_result = &(detect_result_group.results[i]);
    sprintf(text, "%s %.1f%%", det_result->name, det_result->prop * 100);
    printf("%s @ (%d %d %d %d) %f\n", det_result->name, det_result->box.left, det_result->box.top,
           det_result->box.right, det_result->box.bottom, det_result->prop);
    int x1 = det_result->box.left;
    int y1 = det_result->box.top;
    int x2 = det_result->box.right;
    int y2 = det_result->box.bottom;
    cv::rectangle(input_mat, cv::Point(x1, y1), cv::Point(x2, y2), cv::Scalar(255, 0, 0, 255), 3);
    cv::putText(input_mat, text, cv::Point(x1, y1 + 12), cv::FONT_HERSHEY_SIMPLEX, 0.6, cv::Scalar(0, 0, 255), 2);
  }

  // cc->Outputs().Tag(kImageFrameTag).Add(std::move(output_frame.release()), cc->InputTimestamp());
  cc->Outputs().Tag(kImageFrameTag).AddPacket(MakePacket<ImageFrame>(std::move(const_cast<ImageFrame&>(input_frame))).At(cc->InputTimestamp()));

  return absl::OkStatus();
}

void PostProcessCalculator::deinitPostProcess()
{
  for (int i = 0; i < OBJ_CLASS_NUM; i++) 
  {
    if (labels_[i] != nullptr) {
      free(labels_[i]);
      labels_[i] = nullptr;
    }
  }
}
absl::Status PostProcessCalculator::Close(CalculatorContext* cc) 
{
  
  deinitPostProcess();

  return absl::OkStatus();
}

}  // namespace mediapipe

为了编译器能编译上面的caculator,在mediapipe/calculators/rknn文件夹的BUILD文件,新增内容:

cc_library(
    name = "post_process_calculator",
    hdrs = ["//mediapipe/framework/formats/rknn:output.h"],
    srcs = ["post_process_calculator.cc"],
    copts = select({
        "//mediapipe:ios": [
            "-x objective-c++",
            "-fobjc-arc",  # enable reference-counting
        ],
        "//conditions:default": [],
    }),
    deps = [
        ":post_process_calculator_cc_proto",
        "//mediapipe/framework:calculator_framework",
        "//mediapipe/framework:packet",
        "//mediapipe/framework:timestamp",
        "//mediapipe/framework/formats:image_frame",
        "//mediapipe/framework/formats:image_frame_opencv",
        "//mediapipe/framework/formats:video_stream_header",
        "//mediapipe/framework/port:opencv_core",
        "//mediapipe/framework/port:opencv_imgproc",
        "//mediapipe/framework/port:opencv_imgcodecs",
        "//mediapipe/framework/port:logging",
        "//mediapipe/framework/port:ret_check",
        "//mediapipe/framework/port:status",
        "//mediapipe/framework/stream_handler:fixed_size_input_stream_handler",
        "@com_google_absl//absl/log:absl_check",
        "@com_google_absl//absl/log:absl_log",
        "@com_google_absl//absl/memory",
        "@com_google_absl//absl/status",
        "@com_google_absl//absl/strings",
    ],
    alwayslink = 1,
)

2.3 新增graph

新增了所需的calculator后,还需要告诉编译器在编译可执行程序时需要依赖哪些calculator。新增mediapipe/mediapipe/graphs/rknn_yolov5/BUILD文件,并添加如下内容:

# Copyright 2019 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

load(
    "//mediapipe/framework/tool:mediapipe_graph.bzl",
    "mediapipe_binary_graph",
)

licenses(["notice"])

package(default_visibility = ["//visibility:public"])

cc_library(
    name = "rknn_yolov5_calculators",
    deps = [
        "//mediapipe/calculators/core:concatenate_vector_calculator",
        "//mediapipe/calculators/core:flow_limiter_calculator",
        "//mediapipe/calculators/image:rga_calculator",
        "//mediapipe/calculators/rknn:rknn_yolov5_calculator",
        "//mediapipe/calculators/rknn:post_process_calculator",
        "//mediapipe/calculators/core:previous_loopback_calculator",
        "//mediapipe/calculators/core:split_vector_calculator",
        "//mediapipe/calculators/video:opencv_video_decoder_calculator",
        "//mediapipe/calculators/video:opencv_video_encoder_calculator",
    ],
)

三、 编译运行

新增文件夹mediapipe/mediapipe/examples/desktop/rknn_yolov5,在rknn_yolov5中新增文件BUILD,内容如下:

# Copyright 2019 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

licenses(["notice"])

package(default_visibility = ["//mediapipe/examples:__subpackages__"])

cc_binary(
    name = "rknn_yolov5",
    deps = [
        "//mediapipe/examples/desktop:demo_run_graph_main",
        "//mediapipe/graphs/rknn_yolov5:rknn_yolov5_calculators",
    ],
)

这样就可以编译可执行程序rknn_yolov5了

编译:
bazel-6.5.0-linux-arm64 build -c opt --copt=-g --define MEDIAPIPE_DISABLE_GPU=1 mediapipe/examples/desktop/rknn_yolov5:rknn_yolov5

运行:
bazel-bin/mediapipe/examples/desktop/rknn_yolov5/rknn_yolov5 --calculator_graph_config_file=mediapipe/graphs/rknn_yolov5/rknn_yolov5_desktop_live.pbtxt

结果如下:
在这里插入图片描述

总结

注:本文代码仅供参考流程,其中的代码规范,性能内存,隐藏bug等笔者并未投入修改,使用时还需注意。

总体来看,新增自定义graph和calculator还是比较简单的。这源于mediapipe框架的模块化设计,同时主体架构和业务代码分离,兼顾了易扩展的同时,性能也很优秀。

后续将会讲一些mediapipe框架的概念、流程、设计思想,比如graph如何构建、节点如何调度等,感兴趣的同学可以持续关注。

欲知后事如何,且听下回分解…

更多推荐