Performance on Cartesian hex meshes: bypassing per-element Jacobian/pa_data precomputation Category: Ideas #5243

findscripter · 2026-02-21T13:26:48Z

findscripter
Feb 21, 2026

Hi,

I'm solving 3D Poisson (-Δu = 1, homogeneous Dirichlet BC) on regular hexahedral meshes created with Mesh::MakeCartesian3D. I also wrote a specialized Q1 matrix-free solver that exploits the structured-grid property. The performance gap is significant:

N	DOFs	MFEM ex26 (PA + GMG)	Custom solver	Ratio
64	274K	13.3 s	0.6 s	22×
128	2.15M	212 s	4.2 s	50×
256	17M	OOM	41.5 s	—

Test environment: single node, 2× Intel Xeon Gold 6526Y, 384 GB.

Root cause (from reading MFEM source)

On a Cartesian mesh every element has the identical Jacobian J = diag(h/2, h/2, h/2), so the element stiffness matrix Ke is also identical across all elements. My solver exploits this:

Compute one 8×8 Ke (64 doubles) once; all elements share it.
MatVec = element loop + shared-Ke 8×8 direct multiply.
Zero mesh storage — node coordinates are implicit: (i·h, j·h, k·h).
Diagonal extraction: diag[node] = Ke[0][0] × (number of elements sharing that node).

In contrast, MFEM's PA path (PADiffusionSetup3D in bilininteg_diffusion_kernels.cpp) precomputes pa_data of size Q1D³ × 6 × NE independently for every element. For Q1 (Q1D = 2) at N = 256 (~16.7 M elements), that is ~640 MB of redundant identical data.

Similarly, GeometricFactors stores J(NQ, 3, 3, NE) — one full Jacobian per element, all identical.

Additionally, for Q1 (D1D = 2, Q1D = 2) the sum-factorization tensor contraction (Bᵀ D B) actually has more overhead than a direct 8×8 matrix-vector multiply.

Questions

Does MFEM have any mechanism to detect a constant-Jacobian mesh and skip per-element pa_data storage?
Would there be interest in an optimization path for meshes produced by MakeCartesian3D — sharing Jacobian/Ke and reducing memory from O(NE) to O(1)?
For Q1 low-order elements, could a simpler kernel (direct 8×8 multiply) be selected instead of sum-factorization?

I'm happy to share my full implementation (serial + OpenMP + MPI, h-multigrid V-cycle + PCG) if it would be useful as a performance reference.

Thanks!

jPlasma · 2026-02-26T19:51:28Z

jPlasma
Feb 26, 2026

Hello, that sounds very interesting! Could you please share a reference implementation of it? Thank you!

2 replies

findscripter Feb 27, 2026
Author

fem_gmg_simple.zip

Bug Report: Segfault in Derivatives3D for DOF > 8.9M（MFEM ex26 N=256）

While testing MFEM ex26 on large Cartesian meshes, I encountered a segmentation fault that I've debugged and would like to report.

Reproduction

# Compile
g++ -O3 -I/path/to/mfem -L/path/to/mfem -o ex26_regular ex26_regular.cpp -lmfem -std=c++14

# Works fine
./ex26_regular -n 206 -or 0  # DOF = 8.87M ✅

# Crashes with segfault
./ex26_regular -n 207 -or 0  # DOF = 9.00M ❌
./ex26_regular -n 256 -or 0  # DOF = 17M ❌

Crash boundary: DOF ≈ 8.9M elements (N ≈ 206-207)

Environment

Hardware: 2× Intel Xeon Gold 6526Y, 384 GB RAM (258 GB available)
OS: Linux
Compiler: g++ -O3
MFEM version: Latest from master branch

Backtrace from GDB

Program received signal SIGSEGV, Segmentation fault.
0x000000000078ba82 in void mfem::internal::quadrature_interpolator::Derivatives3D<(mfem::QVectorLayout)0, false, 3, 2, 3>(...)

#0  Derivatives3D<0, false, 3, 2, 3>()
#1  TensorDerivatives<0>()
#2  mfem::QuadratureInterpolator::Mult()
#3  mfem::GeometricFactors::Compute()
#4  mfem::Mesh::GetGeometricFactors()
#5  mfem::DiffusionIntegrator::AssemblePA()
#6  mfem::PABilinearFormExtension::Assemble()
#7  mfem::BilinearForm::Assemble()
#8  main

Registers:
rax=0x48  rbx=0x1553b8fd00d0  rcx=0x1553b8fd0010  rdx=0x6ff0bc90
rsi=0x6ff0bc10  rdi=0x558  rbp=0x0  rsp=0x7fffffff9fb8
r8=0x0   ← Possible null pointer

si_addr=NULL (null pointer access)

Source Location

File: mfem/fem/qinterp/grad.hpp, lines 229-360
Function: Derivatives3D<QVectorLayout::byNODES, false, 3, 2, 3>

Analysis

Not an OOM issue: 258 GB memory available, only ~16 GB needed for N=256
Null pointer access: si_addr=NULL suggests a pointer issue, not out-of-memory
Consistent boundary: Always crashes at N≥207 (DOF ≥9.0M), works at N≤206
Tested allocations: Successfully allocated 5.1 GB array (Jacobian size for N=207)

Possible Causes

Index overflow in Jacobian access: j(qx, qy, qz, row, col, e) where e can be large
Internal buffer limit: Some MFEM internal structure may have a size limit
Compiler optimization bug: The -O3 optimization may mishandle a boundary case

Minimal Test Case

// ex26_regular.cpp - Modified ex26 with MakeCartesian3D
#include "mfem.hpp"

int main() {
   int N = 207;  // Crashes
   // int N = 206;  // Works

   Mesh *mesh = new Mesh(Mesh::MakeCartesian3D(N, N, N, Element::HEXAHEDRON));

   FiniteElementCollection *fec = new H1_FECollection(1, 3);
   FiniteElementSpace *fespace = new FiniteElementSpace(mesh, fec);

   // Crashes here during BilinearForm::Assemble()
   BilinearForm *a = new BilinearForm(fespace);
   a->SetAssemblyLevel(AssemblyLevel::PARTIAL);
   a->AddDomainIntegrator(new DiffusionIntegrator(one));
   a->Assemble();  // ← Segfault

   return 0;
}

Impact

This bug prevents MFEM from handling large structured meshes (>9M DOF).

findscripter Feb 27, 2026
Author

My core objective is to address the issue of excessively long initialization time during the setup phase of MFEM when processing regular structured meshes (e.g., the Cartesian 3D meshes used in your tests). This inefficiency stems from redundant computations introduced by the framework's inherently general-purpose design, and I aim to implement targeted optimizations to reduce this setup time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MFEM

Performance on Cartesian hex meshes: bypassing per-element Jacobian/pa_data precomputation Category: Ideas #5243

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

MFEM

Performance on Cartesian hex meshes: bypassing per-element Jacobian/pa_data precomputation Category: Ideas #5243

Uh oh!

findscripter Feb 21, 2026

Root cause (from reading MFEM source)

Questions

Replies: 1 comment · 2 replies

Uh oh!

jPlasma Feb 26, 2026

Uh oh!

findscripter Feb 27, 2026 Author

Bug Report: Segfault in Derivatives3D for DOF > 8.9M（MFEM ex26 N=256）

Reproduction

Environment

Backtrace from GDB

Source Location

Analysis

Possible Causes

Minimal Test Case

Impact

Uh oh!

findscripter Feb 27, 2026 Author

findscripter
Feb 21, 2026

Replies: 1 comment 2 replies

jPlasma
Feb 26, 2026

findscripter Feb 27, 2026
Author

findscripter Feb 27, 2026
Author