4137. Fix Mandates, Preconditions, and Complexity elements of [linalg] algorithms

Section: 29.9.14 [linalg.algs.blas2], 29.9.15 [linalg.algs.blas3] Status: New Submitter: Mark Hoemmen Opened: 2024-08-08 Last modified: 2024-08-11

Priority: Not Prioritized

View all issues with New status.

Discussion:

As pointed out by Raffaele Solcà (CSCS Swiss National Supercomputing Centre), some of the Mandates, Preconditions, and Complexity elements of some BLAS 2 and BLAS 3 algorithms in [linalg] are incorrect.

Proposed resolution:

This wording is relative to N4988.

  1. Modify 29.9.14.1 [linalg.algs.blas2.gemv] as indicated:

    [Drafting note: This change is needed because the matrix A does not need to be square. x.extents(0) must equal A.extents(1), while y.extents(0) must equal A.extents(0).]

    -3- Mandates:

    1. (3.1) — possibly-multipliable<decltype(A), decltype(x), decltype(y)>() is true, and

    2. (3.2) — possibly-addable<decltype(yx), decltype(y), decltype(z)>() is true for those overloads that take a z parameter.

    -4- Preconditions:

    1. (4.1) — multipliable(A, x, y) is true, and

    2. (4.2) — addable(yx, y, z) is true for those overloads that take a z parameter.

    -5- Complexity: 𝒪(Ax.extent(0) × xA.extent(01)).

  2. Modify 29.9.14.2 [linalg.algs.blas2.symv] as indicated:

    -3- Mandates:

    1. (3.1) — […]

    2. (3.2) — […]

    3. (3.3) — possibly-multipliable<decltype(A), decltype(x), decltype(y)>() is true, and

    4. (3.4) — possibly-addable<decltype(yx), decltype(y), decltype(z)>() is true for those overloads that take a z parameter.

    -4- Preconditions:

    1. (4.1) — A.extent(0) equals A.extent(1),

    2. (4.2) — multipliable(A, x, y) is true, and

    3. (4.3) — addable(yx, y, z) is true for those overloads that take a z parameter.

    -5- Complexity: 𝒪(Ax.extent(0) × xA.extent(01)).

  3. Modify 29.9.14.3 [linalg.algs.blas2.hemv] as indicated:

    -3- Mandates:

    1. (3.1) — […]

    2. (3.2) — […]

    3. (3.3) — possibly-multipliable<decltype(A), decltype(x), decltype(y)>() is true, and

    4. (3.4) — possibly-addable<decltype(yx), decltype(y), decltype(z)>() is true for those overloads that take a z parameter.

    -4- Preconditions:

    1. (4.1) — A.extent(0) equals A.extent(1),

    2. (4.2) — multipliable(A, x, y) is true, and

    3. (4.3) — addable(yx, y, z) is true for those overloads that take a z parameter.

    -5- Complexity: 𝒪(Ax.extent(0) × xA.extent(01)).

  4. Modify 29.9.14.4 [linalg.algs.blas2.trmv] as indicated:

    [Drafting note: The extents compatibility conditions are expressed differently than in the above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. They look correct here. The original Complexity elements adjusted below are technically correct, since A is square, but changing this would improve consistency with 29.9.14.1 [linalg.algs.blas2.gemv]]

    template<in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec,
             out-vector OutVec>
      void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InVec x, OutVec y);
    template<class ExecutionPolicy,
             in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec,
             out-vector OutVec>
      void triangular_matrix_vector_product(ExecutionPolicy&& exec,
                                            InMat A, Triangle t, DiagonalStorage d, InVec x, OutVec y);
    

    -5- […]

    -6- Effects: Computes y = Ax.

    -5- Complexity: 𝒪(Ax.extent(0) × xA.extent(01)).

    template<in-matrix InMat, class Triangle, class DiagonalStorage, inout-vector InOutVec>
      void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InOutVec y);
    template<class ExecutionPolicy,
             in-matrix InMat, class Triangle, class DiagonalStorage, inout-vector InOutVec>
      void triangular_matrix_vector_product(ExecutionPolicy&& exec,
                                            InMat A, Triangle t, DiagonalStorage d, InOutVec y);
    

    -8- […]

    -9- Effects: […]

    -10- Complexity: 𝒪(Ay.extent(0) × yA.extent(01)).

    template<in-matrix InMat, class Triangle, class DiagonalStorage, 
             in-vector InVec1, in-vector InVec2, out-vector OutVec>
      void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d,
                                            InVec1 x, InVec2 y, OutVec z);
    template<class ExecutionPolicy,
             in-matrix InMat, class Triangle, class DiagonalStorage, 
             in-vector InVec1, in-vector InVec2, out-vector OutVec>
      void triangular_matrix_vector_product(ExecutionPolicy&& exec,
                                            InMat A, Triangle t, DiagonalStorage d,
                                            InVec1 x, InVec2 y, OutVec z);
    

    -11- […]

    -12- Effects: Computes z = y + Ax.

    -13- Complexity: 𝒪(Ax.extent(0) × xA.extent(01)).

  5. Modify 29.9.15.4 [linalg.algs.blas3.rankk] as indicated:

    [Drafting note: P3371R0, to be submitted in the August 15 mailing for LEWG review, contains the same wording changes to 29.9.15.4 [linalg.algs.blas3.rankk] and 29.9.15.5 [linalg.algs.blas3.rank2k] as proposed here, with additional changes corresponding to that proposal. Please apply this LWG issue's changes first, before P3371 merges]

    -3- Mandates:

    1. (3.1) — If InOutMat has layout_blas_packed layout, then the layout's Triangle template argument has the same type as the function's Triangle template argument; and

    2. (3.2) — possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)> compatible-static-extents<decltype(A), decltype(A)>(0, 1) is true.;

    3. (3.3) — compatible-static-extents<decltype(C), decltype(C)>(0, 1) is true; and

    4. (3.4) — compatible-static-extents<decltype(A), decltype(C)>(0, 0) is true.

    -4- Preconditions: multipliable(A, transposed(A), C) is true.

    1. (4.1) — A.extent(0) equals A.extent(1),

    2. (4.2) — C.extent(0) equals C.extent(1), and

    3. (4.3) — A.extent(0) equals C.extent(0).

    -5- Complexity: 𝒪(A.extent(0) × A.extent(1) × AC.extent(0)).

  6. Modify 29.9.15.5 [linalg.algs.blas3.rank2k] as indicated:

    -3- Mandates:

    1. (3.1) — If InOutMat has layout_blas_packed layout, then the layout's Triangle template argument has the same type as the function's Triangle template argument;

    2. (3.2) — possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>() possibly-addable<decltype(A), decltype(B), decltype(C)>() is true; and

    3. (3.3) — possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>(0, 1) compatible-static-extents<decltype(A), decltype(A)>(0, 1) is true.

    -4- Preconditions:

    1. (4.1) — multipliable(A, transposed(B), C) addable(A, B, C) is true, and

    2. (4.2) — multipliable(B, transposed(A), C) is true A.extent(0) equals A.extent(1).

    -5- Complexity: 𝒪(A.extent(0) × A.extent(1) × BC.extent(0)).

  7. Modify 29.9.15.6 [linalg.algs.blas3.trsm] as indicated:

    [Drafting note: Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible]

    template<in-matrix InMat1, class Triangle, class DiagonalStorage,
             in-matrix InMat2, out-matrix OutMat, class BinaryDivideOp>
      void triangular_matrix_matrix_left_solve(InMat1 A, Triangle t, DiagonalStorage d,
                                               InMat2 B, OutMat X, BinaryDivideOp divide);
    template<class ExecutionPolicy,
             in-matrix InMat1, class Triangle, class DiagonalStorage,
             in-matrix InMat2, out-matrix OutMat, class BinaryDivideOp>
      void triangular_matrix_matrix_left_solve(ExecutionPolicy&& exec,
                                               InMat1 A, Triangle t, DiagonalStorage d,
                                               InMat2 B, OutMat X, BinaryDivideOp divide);
    

    […]

    -6- Complexity: 𝒪(A.extent(0) × BX.extent(1) × BX.extent(1)).

  8. Modify 29.9.15.7 [linalg.algs.blas3.inplacetrsm] as indicated:

    [Drafting note: Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible]

    template<in-matrix InMat, class Triangle, class DiagonalStorage,
             inout-matrix InOutMat, class BinaryDivideOp>
      void triangular_matrix_matrix_right_solve(InMat A, Triangle t, DiagonalStorage d,
                                               InOutMat B, BinaryDivideOp divide);
    template<class ExecutionPolicy,
             in-matrix InMat, class Triangle, class DiagonalStorage,
             inout-matrix InOutMat, class BinaryDivideOp>
      void triangular_matrix_matrix_right_solve(ExecutionPolicy&& exec,
                                               InMat A, Triangle t, DiagonalStorage d,
                                               InOutMat B, BinaryDivideOp divide);
    

    […]

    -13- Complexity: 𝒪(BA.extent(0) × A.extent(01) × AB.extent(1)).