Performance on Cartesian hex meshes: bypassing per-element Jacobian/pa_data precomputation Category: Ideas #5243
findscripter
started this conversation in
Ideas
Replies: 1 comment 2 replies
-
|
Hello, that sounds very interesting! Could you please share a reference implementation of it? Thank you! |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm solving 3D Poisson (
-Δu = 1, homogeneous Dirichlet BC) on regular hexahedral meshes created withMesh::MakeCartesian3D. I also wrote a specialized Q1 matrix-free solver that exploits the structured-grid property. The performance gap is significant:Test environment: single node, 2× Intel Xeon Gold 6526Y, 384 GB.
Root cause (from reading MFEM source)
On a Cartesian mesh every element has the identical Jacobian
J = diag(h/2, h/2, h/2), so the element stiffness matrix Ke is also identical across all elements. My solver exploits this:(i·h, j·h, k·h).diag[node] = Ke[0][0] × (number of elements sharing that node).In contrast, MFEM's PA path (
PADiffusionSetup3Dinbilininteg_diffusion_kernels.cpp) precomputespa_dataof sizeQ1D³ × 6 × NEindependently for every element. For Q1 (Q1D = 2) at N = 256 (~16.7 M elements), that is ~640 MB of redundant identical data.Similarly,
GeometricFactorsstoresJ(NQ, 3, 3, NE)— one full Jacobian per element, all identical.Additionally, for Q1 (D1D = 2, Q1D = 2) the sum-factorization tensor contraction (
Bᵀ D B) actually has more overhead than a direct 8×8 matrix-vector multiply.Questions
pa_datastorage?MakeCartesian3D— sharing Jacobian/Ke and reducing memory fromO(NE)toO(1)?I'm happy to share my full implementation (serial + OpenMP + MPI, h-multigrid V-cycle + PCG) if it would be useful as a performance reference.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions