ePubs
The open archive for STFC research publications
Home
About ePubs
Content Policies
News
Help
Privacy/Cookies
Suggest an Enhancement
Contact ePubs
Full Record Details
Persistent URL
http://purl.org/net/epubs/work/62933
Record Status
Checked
Record Id
62933
Title
Fast triangular solve on GPUs
Contributors
J Hogg (STFC Rutherford Appleton Lab.)
Abstract
The solve phase of a sparse direct solver is memory bound, and in some applications it may be run tens of times for each factorization. Given the continuation of Moore's law, the compute-bound factorization phase can be performed very quickly, meaning performance of the solve phase is increasingly important. Modern GPUs have a significantly larger memory bandwidth than modern CPUs, and hence provide an attractive target upon which to execute the solve phase. The sparse solve phase is typically constructed from the dense triangular solve and matrix-vector multiply operations implemented as the level 2 BLAS routines _trsv and _gemv, respectively. The current NVIDIA CUBLAS implementation of _trsv fails to beat the host MKL performance on all except the largest matrices. In this talk, we describe how to improve performance by an order of magnitude through minimizing memory latency overheads and the use of global memory rather than kernel launches for synchronization.
Organisation
CSE-NAG
,
STFC
,
SCI-COMP
Keywords
Funding Information
Related Research Object(s):
63053
Licence Information:
Language
English (EN)
Type
Details
URI(s)
Local file(s)
Year
Presentation
Presented at Parallel Matrix Algorithms and Applications 2012 (PMAA 2012), Birkbeck University, London, UK, 28-30 Jun 2012.
fast_triangular_solve.pdf
2012
Showing record 1 of 1
Recent Additions
Browse Organisations
Browse Journals/Series
Login to add & manage publications and access information for OA publishing
Username:
Password:
Useful Links
Chadwick & RAL Libraries
SHERPA FACT
SHERPA RoMEO
SHERPA JULIET
Journal Checker Tool
Google Scholar