Section: 29.9.14 [linalg.algs.blas2], 29.9.15 [linalg.algs.blas3] Status: New Submitter: Mark Hoemmen Opened: 2024-08-08 Last modified: 2024-08-11
Priority: Not Prioritized
View all issues with New status.
Discussion:
As pointed out by Raffaele Solcà (CSCS Swiss National Supercomputing Centre), some of the Mandates, Preconditions, and Complexity elements of some BLAS 2 and BLAS 3 algorithms in [linalg] are incorrect.
Proposed resolution:
This wording is relative to N4988.
Modify 29.9.14.1 [linalg.algs.blas2.gemv] as indicated:
[Drafting note: This change is needed because the matrix
A
does not need to be square.x.extents(0)
must equalA.extents(1)
, whiley.extents(0)
must equalA.extents(0)
.]
-3- Mandates:
(3.1) —
possibly-multipliable<decltype(A), decltype(x), decltype(y)>()
istrue
, and(3.2) —
possibly-addable<decltype(y
isx), decltype(y), decltype(z)>()true
for those overloads that take az
parameter.-4- Preconditions:
(4.1) —
multipliable(A, x, y)
istrue
, and(4.2) —
addable(y
isx, y, z)true
for those overloads that take az
parameter.-5- Complexity: 𝒪(
A
×x.extent(0)x
).A.extent(01)
Modify 29.9.14.2 [linalg.algs.blas2.symv] as indicated:
-3- Mandates:
(3.1) — […]
(3.2) — […]
(3.3) —
possibly-multipliable<decltype(A), decltype(x), decltype(y)>()
istrue
, and(3.4) —
possibly-addable<decltype(y
isx), decltype(y), decltype(z)>()true
for those overloads that take az
parameter.-4- Preconditions:
(4.1) —
A.extent(0)
equalsA.extent(1)
,(4.2) —
multipliable(A, x, y)
istrue
, and(4.3) —
addable(y
isx, y, z)true
for those overloads that take az
parameter.-5- Complexity: 𝒪(
A
×x.extent(0)x
).A.extent(01)
Modify 29.9.14.3 [linalg.algs.blas2.hemv] as indicated:
-3- Mandates:
(3.1) — […]
(3.2) — […]
(3.3) —
possibly-multipliable<decltype(A), decltype(x), decltype(y)>()
istrue
, and(3.4) —
possibly-addable<decltype(y
isx), decltype(y), decltype(z)>()true
for those overloads that take az
parameter.-4- Preconditions:
(4.1) —
A.extent(0)
equalsA.extent(1)
,(4.2) —
multipliable(A, x, y)
istrue
, and(4.3) —
addable(y
isx, y, z)true
for those overloads that take az
parameter.-5- Complexity: 𝒪(
A
×x.extent(0)x
).A.extent(01)
Modify 29.9.14.4 [linalg.algs.blas2.trmv] as indicated:
[Drafting note: The extents compatibility conditions are expressed differently than in the above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. They look correct here. The original Complexity elements adjusted below are technically correct, since is square, but changing this would improve consistency with 29.9.14.1 [linalg.algs.blas2.gemv]]
template<in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec, out-vector OutVec> void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InVec x, OutVec y); template<class ExecutionPolicy, in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec, out-vector OutVec> void triangular_matrix_vector_product(ExecutionPolicy&& exec, InMat A, Triangle t, DiagonalStorage d, InVec x, OutVec y);-5- […]
-6- Effects: Computes . -5- Complexity: 𝒪(A
×x.extent(0)x
).A.extent(01)template<in-matrix InMat, class Triangle, class DiagonalStorage, inout-vector InOutVec> void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InOutVec y); template<class ExecutionPolicy, in-matrix InMat, class Triangle, class DiagonalStorage, inout-vector InOutVec> void triangular_matrix_vector_product(ExecutionPolicy&& exec, InMat A, Triangle t, DiagonalStorage d, InOutVec y);-8- […]
-9- Effects: […] -10- Complexity: 𝒪(A
×y.extent(0)y
).A.extent(01)template<in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec1, in-vector InVec2, out-vector OutVec> void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InVec1 x, InVec2 y, OutVec z); template<class ExecutionPolicy, in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec1, in-vector InVec2, out-vector OutVec> void triangular_matrix_vector_product(ExecutionPolicy&& exec, InMat A, Triangle t, DiagonalStorage d, InVec1 x, InVec2 y, OutVec z);-11- […]
-12- Effects: Computes . -13- Complexity: 𝒪(A
×x.extent(0)x
).A.extent(01)
Modify 29.9.15.4 [linalg.algs.blas3.rankk] as indicated:
[Drafting note: P3371R0, to be submitted in the August 15 mailing for LEWG review, contains the same wording changes to 29.9.15.4 [linalg.algs.blas3.rankk] and 29.9.15.5 [linalg.algs.blas3.rank2k] as proposed here, with additional changes corresponding to that proposal. Please apply this LWG issue's changes first, before P3371 merges]
-3- Mandates:
(3.1) — If
InOutMat
haslayout_blas_packed
layout, then the layout'sTriangle
template argument has the same type as the function'sTriangle
template argument; and(3.2) —
possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)>
iscompatible-static-extents<decltype(A), decltype(A)>(0, 1)true
.;
(3.3) —compatible-static-extents<decltype(C), decltype(C)>(0, 1)
istrue
; and
(3.4) —compatible-static-extents<decltype(A), decltype(C)>(0, 0)
istrue
.-4- Preconditions:
multipliable(A, transposed(A), C)
istrue
.
(4.1) —A.extent(0)
equalsA.extent(1)
,
(4.2) —C.extent(0)
equalsC.extent(1)
, and
(4.3) —A.extent(0)
equalsC.extent(0)
.-5- Complexity: 𝒪(
A.extent(0)
×A.extent(1)
×A
).C.extent(0)
Modify 29.9.15.5 [linalg.algs.blas3.rank2k] as indicated:
-3- Mandates:
(3.1) — If
InOutMat
haslayout_blas_packed
layout, then the layout'sTriangle
template argument has the same type as the function'sTriangle
template argument;(3.2) —
possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>()
ispossibly-addable<decltype(A), decltype(B), decltype(C)>()true
; and(3.3) —
possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>(0, 1)
iscompatible-static-extents<decltype(A), decltype(A)>(0, 1)true
.-4- Preconditions:
(4.1) —
multipliable(A, transposed(B), C)
isaddable(A, B, C)true
, and(4.2) —
multipliable(B, transposed(A), C)
istrue
.A.extent(0)
equalsA.extent(1)
-5- Complexity: 𝒪(
A.extent(0)
×A.extent(1)
×B
).C.extent(0)
Modify 29.9.15.6 [linalg.algs.blas3.trsm] as indicated:
[Drafting note: Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible]
template<in-matrix InMat1, class Triangle, class DiagonalStorage, in-matrix InMat2, out-matrix OutMat, class BinaryDivideOp> void triangular_matrix_matrix_left_solve(InMat1 A, Triangle t, DiagonalStorage d, InMat2 B, OutMat X, BinaryDivideOp divide); template<class ExecutionPolicy, in-matrix InMat1, class Triangle, class DiagonalStorage, in-matrix InMat2, out-matrix OutMat, class BinaryDivideOp> void triangular_matrix_matrix_left_solve(ExecutionPolicy&& exec, InMat1 A, Triangle t, DiagonalStorage d, InMat2 B, OutMat X, BinaryDivideOp divide);[…]
-6- Complexity: 𝒪(A.extent(0)
×B
×X.extent(1)B
).X.extent(1)
Modify 29.9.15.7 [linalg.algs.blas3.inplacetrsm] as indicated:
[Drafting note: Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible]
template<in-matrix InMat, class Triangle, class DiagonalStorage, inout-matrix InOutMat, class BinaryDivideOp> void triangular_matrix_matrix_right_solve(InMat A, Triangle t, DiagonalStorage d, InOutMat B, BinaryDivideOp divide); template<class ExecutionPolicy, in-matrix InMat, class Triangle, class DiagonalStorage, inout-matrix InOutMat, class BinaryDivideOp> void triangular_matrix_matrix_right_solve(ExecutionPolicy&& exec, InMat A, Triangle t, DiagonalStorage d, InOutMat B, BinaryDivideOp divide);[…]
-13- Complexity: 𝒪(B
×A.extent(0)A.extent(0
×1)A
).B.extent(1)