达芬奇平台实时实现avsd1系统——帧间编码算法

资源描述

《达芬奇平台实时实现avsd1系统——帧间编码算法》由会员分享，可在线阅读，更多相关《达芬奇平台实时实现avsd1系统——帧间编码算法（88页珍藏版）》请在金锄头文库上搜索。

1、太原理工大学硕士学位论文达芬奇平台实时实现AVS（D1）系统帧间编码算法姓名：段巧娟申请学位级别：硕士专业：指导教师：张刚太原理工大学硕士学位论文 I 达芬奇平台实时实现 AVS（D1）系统帧间编码算法摘要 AVS 是我国第一个具有自主知识产权的数字音视频编解码技术标准，在技术和性能上达到国际先进水平，是高清晰度数字电视、网络电视、视频通信等重大音视频应用共同采用的基础性标准。制约 AVS 标准推广的瓶颈是编码算法的实时实现技术尚未获得实质上的突破。达芬奇（TMS320DM6446）平台是 TI 公司为了满足下一代嵌入式网络多媒体设备的应用而开发的高性能数字信号处

2、理器。该芯片为 DSP+ARM 双核架构，还有一个视频图像协处理器（VICP）可以与 DSP 进行并行处理，同时还集成了功能丰富的视频前端和后端系统，是实现 AVS 编码算法的理想平台。本文针对 AVS 编码标准的核心技术开展研究，利用 TMS320DM6446 硬件平台针对帧间编码算法设计了实时实现 D1（720576）分辨率的解决方案，完成了以下工作：提高 cache 命中率首先调整编码数据减少每次处理的数据量：通过 skip 模式改进将亮度块与色度块的编码分离开来；通过搜索范围的限制使得当前块的参考范围减少至该宏块周围的 2424 个像素，从而节省了整个参考帧的数据

3、量；改进半像素插值算法，插值不再针对满帧数据进行而是在帧间预测需要时对当前块进行半像素插值，节省了数据空间；将程序中大的结构体拆分；编太原理工大学硕士学位论文 II 码过程是按宏块进行编码时只需把与当前宏块相关的数据、重要信息放在内部存储器中来提高数据 cache 的命中率。其次改动程序流程：把 I 帧与 P 帧分开编码；把亮度与色度分开编码；把熵编码放在每一帧的最后进行；这样使程序尽量保持线性执行顺序，形成一条处理链来，提高程序 cache 的命中率。多处理器并行利用 VICP 处理 SAD 计算，消耗时间为 DSP 的 0.05%，本文采用 VICP 处理帧间预测中复

4、杂、耗时的搜索部分，同时与 DSP 并行进行操作节省了大量的时间。全零块判别该方法是一种高效的编码算法，省去了变换、量化、反量化、反变换一系列的编码操作。本文独立推导获得全零块阈值，并运用到整个算法中。通过全零块判别帧率可以提高 20%，运动缓慢的图像甚至可以达到 50%以上。通过以上优化，D1 格式的图像编码帧率从开始的 0.47fps 到目前的 3.84fps，效率提高了 7 倍。关键词：AVS，TMS320DM6446，D1，帧间编码，实时实现太原理工大学硕士学位论文 III THE REALIZATION OF AVS（D1） REAL-TIME SYSTEM BAS

5、ED ON DAVINCI -INTER CODING ABSTRACT AVS is the first digital audio/video encoding and decoding technology standard of independent intellectual property rights in China. This standard has reached the international advanced level of digital audio/video encoding and decoding standard in technology and

6、 performance, and is a common adopted basic standard in the important application of high definition digital television, network television, video communication and so on. The bottleneck of restricting AVS stardards popularization is the real-time realization technology of encoding algorithm has not

7、 virtually any break. For the application of the next generation embedded network multimedia device, TMS320DM6446 platform is a high performance digital signal processor. The chip is DSP+ARM dual-core framework, has a video image coprocessor (VICP) which can do parallel process with DSP, and also in

8、tegrates video front-end and back-end processing system. Its the ideal platform to realize AVS encoding algorithm. Based on the core technology of AVS encoding standard, this paper introduces the design scheme of real-time D1(720*576) resolution AVS 太原理工大学硕士学位论文 IV encoding algorithms realization on

9、 DSP combining with hardware structure on TMS320DM6446 platform. In this paper, inter-frame encoding algorithm is mostly researched, and the tasks are finished as follows: Improve cache hit rate Due to hardware device resource restriction, the realization of D1(high resolution)AVS algorithm real-tim

10、e encoding on the embedded system is very difficult. The use of Cache and proper allocation of data are key technologies. To increase cache hit rate, first, changing encoding data to decrease processing data quantity at a time. By skip modes improvement, the encoding of luma block and chroma block a

11、re separated. By searching areas restriction, the reference area of current block is decreased to 24*24 pixels surrounding the macro block, the process can be executed in this area, then the data space of the whole reference frame is saved. That changing one frame datas interpolation to half pixel i

12、nterpolation of current block in the demand of inter-frame prediction can save one frame half pixel data space. Large structure in the program is splitted. The encoding process is executed according to macro block, by the above method adoption, the related data and important information with current

13、 macro block are put into inner memory to improve data cache hit rate.Second, the program flow is changed: I frame and P frame are processed separately, luma encoding and chroma encoding are processed separately, and entropy encoding is processed at the end of every frame. This arrangement makes the

14、 program execute linearly to the best, and forms a processing chain to improve program 太原理工大学硕士学位论文 V cache hit rate. The parallel of multiprocessor By using VICP lib function, the consuming time of processing SAD computing is DSPs 0.05%, so in the paper, the complicated and time-consuming searching

15、 part of inter frame prediction is processed by VICP, at the same time, DSP can work. This method saves a lot of time as a result. The adoption of all zero block This method is an efficient encoding algorithm, and save a serial encoding operations of DCT, quantization dequantization and inverse DCT.

16、 By the use of all zero block, frame rate can be improved to 20%, and even can be improved to 50% to low movement image. This paper deduces all zero block threshold value, and applies it to the whole algorithms flow. Adoption above all optimizes,the frame rate improve to 3.84fps from 0.47fps original,which increases 7 times. KEY WOR

展开阅读全文

达芬奇平台实时实现avsd1系统——帧间编码算法

最新文档