ePubs
The open archive for STFC research publications
Home
About ePubs
Content Policies
News
Help
Privacy/Cookies
Suggest an Enhancement
Contact ePubs
Full Record Details
Persistent URL
http://purl.org/net/epubs/work/62682
Record Status
Checked
Record Id
62682
Title
_TRSV: optimizing triangular solve in CUDA
Contributors
J Hogg (STFC Rutherford Appleton Lab.)
Abstract
The _trsv level 2 BLAS routine solves the linear system Lx = b for some lower triangular matrix L. It is an important kernel in the solution phase of a direct linear solver, and is often run repeatedly for iterative refinement. The current CUBLAS implementation of _trsv fails to beat the host MKL performance even on many large matrices. In this talk we describe how to improve performance by an order of magnitude through minimizing memory latency overheads and the use of global memory rather than kernel launches for synchronization. These techniques may be of use in other problems where high performance is required but only small amounts of data are used each iteration.
Organisation
CSE-NAG
,
STFC
,
SCI-COMP
Keywords
CUDA
,
GPU
,
Linear Algebra
Funding Information
Related Research Object(s):
Licence Information:
Language
English (EN)
Type
Details
URI(s)
Local file(s)
Year
Presentation
Presented at Oxford e-Research centre Many-Core Seminar Series, Oxford, UK, 23 May 2012.
dtrsv.pdf
2012
Showing record 1 of 1
Recent Additions
Browse Organisations
Browse Journals/Series
Login to add & manage publications and access information for OA publishing
Username:
Password:
Useful Links
Chadwick & RAL Libraries
SHERPA FACT
SHERPA RoMEO
SHERPA JULIET
Journal Checker Tool
Google Scholar