视频和图像处理中像素匹配运算的加速技术研究

资源描述

《视频和图像处理中像素匹配运算的加速技术研究》由会员分享，可在线阅读，更多相关《视频和图像处理中像素匹配运算的加速技术研究（133页珍藏版）》请在金锄头文库上搜索。

1、国防科学技术大学博士学位论文视频和图像处理中像素匹配运算的加速技术研究姓名：谷会涛申请学位级别：博士专业：电子科学与技术指导教师：陈书明 2011-04 国防科学技术大学研究生院博士学位论文第 i 页摘要像素匹配运算广泛应用于通讯、医疗、教育、军事等众多领域的视频和图像处理技术中。典型的像素匹配运算包括视频编码中的运动估计，图像匹配和目标跟踪识别中的相关匹配等。随着视频和图像技术的迅速发展，图像分辨率和帧频不断提高，像素匹配运算复杂度越来越高，对数字信号处理器的性能提出了重大挑战。因此，对像素匹配运算的加速技术进行研究，对于提高信号处理器性能、满足视频或图像处理运

2、算的实时性能需求具有重要的理论和实际应用意义。像素匹配运算具有计算复杂度高、数据密集和实时性强的特点，采用硬件加速器是实现像素匹配实时运算的有效方法。本文分别对视频和图像处理中典型的像素匹配技术，运动估计和相关匹配加速器进行了研究。从提高加速器的性能和灵活性，优化加速器执行的算法，提高加速器与处理器接口的传输效率和灵活性几个方面，有针对性地研究了像素匹配运算的加速技术，并结合 DSP 平台进行了详细的性能分析与评测。本文的主要工作与创新点主要体现在以下几个方面： 1）提出了一种适合硬件实现的多搜索中心快速算法。算法基于多搜索中心预测和搜索范围动态调整实现。多搜索中心预测方法对相邻

3、块的运动向量进行分析和计算，能预测出当前宏块中包含的多种运动趋势。相比传统预测方法，多搜索中心预测最高可提高约 12.9%的预测精度。依据预测运动向量的数目和大小，本文算法对搜索范围进行动态调整，进一步降低了计算时间。相比 FFS 算法、 UMHexagonS 算法和 EPZS 算法，本文算法具有相似的率失真性能，且分别平均节省了 89.9%-98.4%、46.5-67.9%和 20.0-46.8%的计算时间。本文算法采用类似 FFS 的搜索过程，计算规整，利于硬件加速器实现。 2）提出了一种支持多种标准的运动估计协处理器结构。协处理器采用 6 流出超长指令字结构，可执行多种运动

4、估计算法。协处理器中包含一个二维数据重用的处理单元阵列，一个加法树和一个多模编码耗费比较器。处理单元阵列和加法树结构为运动估计运算提供了充足的计算能力。多模耗费比较器用来支持各种不同的分块模式。相比其它运动估计加速器结构，本文协处理器不但可以满足高清实时编码中运动估计的运算需求，而且具有高度的灵活性，可支持多种运动估计算法。 3）提出了一种多标准可配置的亚像素插值加速器结构。本文插值结构包含两个独立的 8 阶插值单元，分别对像素帧进行水平插值和垂直插值操作。通过系数存储器，可灵活配置任意 1/2 和 1/4 像素位置的滤波系数，从而实现了对各种亚像素插值方法的支持。两个插值单元

5、采用帧流水的两步法策略进行插值计算，可以减少约 46%的计算量。相比之前的工作，本文提出的插值结构能以较小的芯片面国防科学技术大学研究生院博士学位论文第 ii 页积提供更高的性能。工作在 250MHz 时，本文结构可满足高清视频的实时亚像素插值操作。 4）提出了一种高效的相关匹配加速器结构。本文结构由处理单元阵列和加法树组成核心计算结构，并对其流水线进行了细致地划分，提高了工作频率和运算速度。通过对处理单元阵列结构和存储器组织结构进行优化，有效降低了加速器的面积和功耗。相比于其它加速器，本文结构的效率最高。对大小为 6464，采样频率为 60fps 的实时图像，本文结构可匹配

6、最多 162 个模版图像。 5）提出了一种面向硬件加速器连接的自定义处理器核接口。通过用户描述的接口信息，本文接口可以自动生成协议转换逻辑，快速实现对 DMA、 AHB 和 PVCI 等不同接口协议的支持，提高了硬件加速器的重用性。通过该接口，处理器核可执行加速器搬移指令与硬件加速器直接进行高带宽的数据传输。相比传统的 DMA 总线，本文结构最高可分别节省 83.7%和 87.1%的传输时间。关键词：像素匹配；硬件加速器；运动估计；亚像素插值；相关匹配；接口自动生成国防科学技术大学研究生院博士学位论文第 iii 页 Abstract Pixel matching method

7、is widely used in communication, medical care, education and military domain. The typical pixel matching computations include motion estimation of video coding, correlation matching of object tracking and recognition and so on. As the rapid development of video and image processing applications, the

8、 image resolution and frame frequency keep to improve, and the the computational complexity of pixel matching greatly increases, presenting a great challenge for the performance of digital signal processor. Therefore, the acceleration research of pixel matching computation is significant for improvi

9、ng the processor performance and meeting the performance requirement of real-time video and image processing computation. Pixel matching is a computation-intensive, data-intensive and real-time method. Hardware accelerator is an efficient approach to implement real-time pixel matching computation Mo

10、tion estimation and correlation matching, which are the classic pixel matching in video and image processing, are studied in this paper. And several acceleration techniques on pixel matching computation are explored to improve the flexibility and performance of motion estimation and correlation matc

11、hing hardware accelerator, optimize the fast algorithm implemented by hardware, and increase transmission efficiency and flexibility of the interface between hardware accelerator and processor. The detailed performance analysis and evaluation for every technique are carried out on our DSP platform.

12、The main contributions and innovations of this thesis are as follows: 1) A multiple search centers motion estimation algorithm suitable for hardware implement is proposed to speed up the computation. The proposed algorithm is based on multi-search centers prediction and dynamic search range adjustme

13、nt. The multi-search centers prediction analyzes motion vectors of spatial and temporal adjacent blocks and predicts multiple motion vectors for the current block. Compared with the traditional motion vector prediction, the proposed prediction method can improve up to 12.9% prediction accuracy. Acco

14、rding to the count and magnitude of the predictive search centers, the search range is dynamically adjusted to further reduce the computational complexity. Compared to the FFS, UMHexagonS and EPZS algorithms, the proposed algorithm can gain similar rate-distortion performance, while reducing about 8

15、9.9%-98.4%, 46.5-67.9%, and 20.0-46.8% computational complexity respectively. Similar to FFS, the search method of the proposed algorithm is easily implemented by hardware, because of its regular computation. 2) A motion estimation coprocessor supporting multiple coding standards is presented. The c

16、oprocessor is designed based on very long instruction words architectures, and can effectively perform various motion estimation algorithms. In the 国防科学技术大学研究生院博士学位论文第 iv 页 proposed hardware architecture, a two dimension data-reused processing element array, a SAD tree structure, and a multiple modes cost comparator are employed. The processing element array and the SAD tree structure can efficiently meet the huge computational complexity of motion estimation, and the multiple modes cost compa

展开阅读全文