自适应混合高斯背景建模算法的gpu并行优化研究

资源描述

《自适应混合高斯背景建模算法的gpu并行优化研究》由会员分享，可在线阅读，更多相关《自适应混合高斯背景建模算法的gpu并行优化研究（70页珍藏版）》请在金锄头文库上搜索。

1、华中科技大学硕士学位论文自适应混合高斯背景建模算法的GPU并行优化研究姓名：钟俊杰申请学位级别：硕士专业：计算机系统结构指导教师：郭红星 2011-01-16 华中科技大学硕士学位论文华中科技大学硕士学位论文 I 摘摘要要运动目标检测是视频跟踪和分析的基础，其中一个首要而关键的任务是从视频序列中确定运动目标。背景消减法是目前确定运动目标最常用的方法，其核心是将当前包含运动目标的视频帧和一个背景参考帧相差，将其中差值较大的像素区域标记为运动目标。在背景建模各方法中，混合高斯建模方法是公认的检测效果和适应性都较好的方法，但其所需

2、计算量巨大，难以实时实现。图形处理器(GPU)通过大量的流计算单元为加速此类应用提供了新的计算平台，因而针对 GPU 平台，挖掘背景建模算法的并行性并进行优化，以提高实时性，对于扩展其应用范围和降低应用成本，具有重要意义。通过利用 GPU 平台上的 CUDA 编译环境，从线程级并行和异步流处理并行两个方面对自适应混合高斯背景建模算法进行并行化改进。线程级并行化主要是利用 CUDA 的内核函数(kernel)，将原算法里的每一个像素的背景更新过程映射到 GPU 的一个流处理单元上进行处理，通过多线程的并行执行，来加速计算速度。异步流处理优化借鉴了流计算的边传输边计算的思想，通过

3、隐藏数据传输所带来的时延来加速计算过程。这里利用 CUDA 编程模型中的流概念，通过创建多个流，使每个流之间的数据传输和计算可以重叠进行，这样就从整体上获得了计算性能的提升，达到了加速的效果。同时，对每个像素的模型参数按照行序优先的规则以分块方式进行组织存放，以配合多流并行处理时内核函数处理数据的需要，保证内核函数能及时存取所需要的数据。在采用了 CUDA 线程级并行化后，通过对分辨率分别为 384288、640272、 720576、1280720 和 19201080 的视频进行测试，结果表明，在 Debug 模式下平均建模时间分别快了 40.932ms、94.656ms、2

4、28.012ms、547.759ms 和 861.459ms；而在 Release模式下的平均建模时间分别快了10.362ms、 33.421ms、 71.594ms、 173.609ms 和 156.02ms。在此基础上，以采用 8 个数据流为参照，进一步进行异步流处理优化后，在 Release 版本下的测试结果表明：在 5 种分辨率下，平均建模时间比优化前分别又快了 2.640ms、3.769ms、10.703ms、19.331ms 和 55.335ms。由此可见，在 GPU 平台上通过线程级并行化和异步流处理优化后，确实可以大幅度地加速混合高斯背景建模算法的执行过程。本文研究工

5、作得到国家自然科学基金项目：嵌入式多媒体流计算的自适应机制与跨层优化（编号： 60873029）和华中科技大学自主创新研究基金（编号： 2010MS014）的支持。关键词关键词：混合高斯背景建模，流计算，线程级并行，图形处理器，并行编程模型华中科技大学硕士学位论文华中科技大学硕士学位论文 II Abstract Moving target detection is the basis of moving image tracking and image analysis, in which a basic and crucial t

6、ask is to determine the desired moving targets. The background subtraction technology is the commonest approach for target detection, which is implemented by subtracting the corresponding bachground frame from a video frame. Then the regions with a relative large difference are labeled as the moving

7、 targets. For all of the background modeling methods, the Gaussian mixture modeling is considered as a good method with high performance in both detection capability and adaptability. Nevertheless, it is difficult to implemented in real time for its huge computational complexity. Fortunately，the eme

8、rging Graphics Processor Units(GPUs) provide a new platform for its implemention because the so many Stream Processor(SP) units of the GPUs can be used to accelerate the computing process. So it is nontrivial to optimize the process of the background modeling by mining its parallelism on GPUs, which

9、 is useful for the extension of the application range, as well as the reduction of the cost. With the assistance of the CUDA programming environment on GPUs, the parallel improvement is applied to a adaptive Gaussian mixture background modeling algoritm from the two aspects of the thread-level paral

10、lelism and the asynchronous stream processing. The thread-level parallelism carried out by mapping the background update process of each pixel onto a Stream Processor as a thread to execute through the kernel function (kernel) provided by CUDA. These threads can run simultaneously, so as to achieve

11、the purpose of the parallel execution and the effects of acceleration. The idea of stream computing, which schedules the computing the corresponding data access in parallel to hide the delay of the data access, is use for reference for the asynchronous stream processing optimization. By creating mul

12、tiple streams in the CUDA programming model, the computing performance can be improved because the process of the access and computation for the data of each flow can overlap each other delicately. Meanwhile, the model parameters of each pixel are organized in blocks to storage according the rule of

13、 the order of row-first, so as to facilitate the data access of kernel functions in the process of multi-stream parallel processing. The video sequences with different resolutions as 384288, 640272, 720576, 1280720 and 19201080 are used to test the performance of the CUDA thread-level parallelism op

14、timization. The experimental results indicate that the average time for the background modeling is reduced by 40.932ms, 94.656ms, 228.012ms, 547.759ms and 861.459ms repectively in the Debug mode; while in the Release mode, the average time is reduced by 10.362ms, 33.421ms, 71.594ms, 173.609ms and 15

15、6.02ms repectively. On the basis of this, the average time for the background modeling is further reduced by 2.64ms, 3.769ms, 10.703ms, 19.331ms and 55.335ms repectively in the Release mode through the asynchronous stream processing optimization which takes the 8 data streams as a typical reference.

16、 It can safely conclude that the process of the Gaussian mixture background modeling could be speeded up significantly through the optimization of the thread-level parallelism and the asynchronous stream processing. This thesis is supported by the project of National Natural Science Foundation of China (No.60873029) and the project of Innovation Research Foundation of Huazhong University of Science and Technology (No.2010MS014). Key words: Gaussian Mixture Background Modeling ,

展开阅读全文

自适应混合高斯背景建模算法的gpu并行优化研究

最新文档