Key research themes
1. How can CUDA optimize parallel image convolution computations to enhance GPU performance?
This theme investigates CUDA implementations for image convolution, a fundamental operation in image processing, focusing on maximizing parallelism, efficient shared memory usage, and reducing idle threads to exploit GPU resources fully. Efficient convolution enhances performance in diverse fields such as computer vision, medical imaging, and graphics.
2. How can parallel GPU programming models, including CUDA, HIP, and OpenACC, be evaluated and optimized for performance portability across heterogeneous GPU architectures?
This theme focuses on comparative analyses of GPU programming models in the CUDA ecosystem and beyond, emphasizing portability, ease of use, performance tuning, and compatibility with emerging GPU architectures such as AMD Instinct GPUs. It explores tools and methodologies to port CUDA code to HIP and other models, bench-marking performance trade-offs and compiler toolchains—important for developing scalable HPC applications on increasingly diverse GPU hardware.
3. What methods improve software intellectual property protection and enable reverse engineering analysis for CUDA applications?
This research cluster explores techniques pertinent to the software protection and forensic reverse engineering domains focused on compiled CUDA binaries. Considering NVIDIA’s CUDA binary formats and compiler behavior, the works analyze static and dynamic reverse engineering strategies and propose best practices for securing CUDA code to prevent intellectual property theft or unauthorized code analysis, critical for software developers deploying proprietary algorithms on GPUs.
4. How can GPU-accelerated frameworks like RAPIDS and multi-GPU CUDA programming improve data-parallel machine learning workloads?
This theme examines integrating GPU-accelerated libraries with CUDA-enabled hardware to optimize parallel machine learning workflows. It focuses on leveraging multi-GPU distributed training, data parallelism, and pipeline parallelism through frameworks like RAPIDS and DASK, quantifying scalability, speedups, and communication overhead to demonstrate practical improvements in big data and AI applications.