Skip to content

Perf: Optimize function append in include/fmt/base.h#4541

Merged
vitaut merged 4 commits intofmtlib:masterfrom
fyrsta7:perf-optimize-some-function
Sep 21, 2025
Merged

Perf: Optimize function append in include/fmt/base.h#4541
vitaut merged 4 commits intofmtlib:masterfrom
fyrsta7:perf-optimize-some-function

Conversation

@fyrsta7
Copy link
Contributor

@fyrsta7 fyrsta7 commented Sep 14, 2025

Summary

This PR optimizes the performance of the append function in include/fmt/base.h.
This performance improvement was identified while profiling the spdlog logging library, a major downstream project that bundles fmt. The benchmarks show that this optimization provides significant gains in spdlog's throughput and latency. For context, this change was first submitted to spdlog (see gabime/spdlog#3465) where I was advised to contribute it upstream directly to fmt.
Across 47 test cases from spdlog's benchmark suite, this change achieves a maximum improvement of 16.67% while guaranteeing no regression exceeds 0.99% in any other case.

Test Plan

  1. Correctness: All existing unit tests in the fmt project pass.
  2. Performance: The performance impact was evaluated using the comprehensive benchmark suite from the spdlog project. This was chosen because spdlog is a key real-world use case, and its benchmarks effectively measure the performance of fmt's core formatting operations under various conditions (single-threaded, multi-threaded, different logging patterns). The commands used were ./bench/bench and ./bench/latency from the spdlog repository.

Performance Evaluation & Results

Testing Protocol:

  • The benchmark was run on an isolated Ubuntu 24.04 server using the spdlog benchmark suite.
  • The first run was discarded to account for cold-start effects.
  • The results below are the average of 5 subsequent runs.
  • The improvement for a test case is calculated as (new_value - old_value) / old_value * 100% if a higher value is better (e.g., throughput), or (old_value - new_value) / new_value * 100% if a lower value is better (e.g., latency). A positive percentage indicates a performance gain.

Results:

Test Case Improvement
overall_throughput_improvement 3.00%
overall_latency_improvement 3.19%
single_threaded.level_off.normal.messages_per_sec 0.37%
single_threaded.level_off.backtrace_on.messages_per_sec 4.33%
single_threaded.rotating_st.normal.messages_per_sec 2.82%
single_threaded.rotating_st.backtrace_on.messages_per_sec 1.95%
single_threaded.basic_st.normal.messages_per_sec 3.15%
single_threaded.basic_st.backtrace_on.messages_per_sec 3.48%
single_threaded.daily_st.normal.messages_per_sec 3.13%
single_threaded.daily_st.backtrace_on.messages_per_sec 3.12%
multi_threaded_1.rotating_mt.normal.messages_per_sec 1.41%
multi_threaded_1.rotating_mt.backtrace_on.messages_per_sec 3.45%
multi_threaded_1.daily_mt.normal.messages_per_sec 2.73%
multi_threaded_1.daily_mt.backtrace_on.messages_per_sec 12.95%
multi_threaded_1.basic_mt.normal.messages_per_sec 2.48%
multi_threaded_1.basic_mt.backtrace_on.messages_per_sec 2.73%
multi_threaded_1.level_off.normal.messages_per_sec -0.19%
multi_threaded_1.level_off.backtrace_on.messages_per_sec 4.35%
multi_threaded_4.rotating_mt.normal.messages_per_sec 2.49%
multi_threaded_4.rotating_mt.backtrace_on.messages_per_sec 1.82%
multi_threaded_4.daily_mt.normal.messages_per_sec 6.08%
multi_threaded_4.daily_mt.backtrace_on.messages_per_sec 3.00%
multi_threaded_4.basic_mt.normal.messages_per_sec 3.94%
multi_threaded_4.basic_mt.backtrace_on.messages_per_sec -0.69%
multi_threaded_4.level_off.normal.messages_per_sec -0.99%
multi_threaded_4.level_off.backtrace_on.messages_per_sec 4.07%
single_threaded.level_off.backtrace_on.elapsed_time 0.00%
single_threaded.rotating_st.normal.elapsed_time 0.00%
single_threaded.rotating_st.backtrace_on.elapsed_time 0.00%
single_threaded.basic_st.normal.elapsed_time 0.00%
single_threaded.basic_st.backtrace_on.elapsed_time 16.67%
single_threaded.daily_st.normal.elapsed_time 3.33%
single_threaded.daily_st.backtrace_on.elapsed_time 12.50%
multi_threaded_1.rotating_mt.normal.elapsed_time 3.03%
multi_threaded_1.rotating_mt.backtrace_on.elapsed_time 4.76%
multi_threaded_1.daily_mt.normal.elapsed_time 0.00%
multi_threaded_1.daily_mt.backtrace_on.elapsed_time 14.89%
multi_threaded_1.basic_mt.normal.elapsed_time 0.00%
multi_threaded_1.basic_mt.backtrace_on.elapsed_time 0.00%
multi_threaded_1.level_off.backtrace_on.elapsed_time 0.00%
multi_threaded_4.rotating_mt.normal.elapsed_time 1.59%
multi_threaded_4.rotating_mt.backtrace_on.elapsed_time 0.00%
multi_threaded_4.daily_mt.normal.elapsed_time 6.06%
multi_threaded_4.daily_mt.backtrace_on.elapsed_time 2.47%
multi_threaded_4.basic_mt.normal.elapsed_time 1.67%
multi_threaded_4.basic_mt.backtrace_on.elapsed_time 0.00%
multi_threaded_4.level_off.backtrace_on.elapsed_time 0.00%

Copy link
Contributor

@vitaut vitaut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

if (free_cap < count) count = free_cap;
auto count = to_unsigned(end - begin);
if (free_cap < count) {
try_reserve(size_ + count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace this with

grow_(*this, size_ + count);

to avoid an extra capacity check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!
I've replaced try_reserve with grow_ to avoid the extra capacity check as you pointed out. The code has been updated.

@vitaut vitaut merged commit 4cce5f4 into fmtlib:master Sep 21, 2025
41 checks passed
@vitaut
Copy link
Contributor

vitaut commented Sep 21, 2025

Merged, thanks!

@fyrsta7
Copy link
Contributor Author

fyrsta7 commented Sep 21, 2025

Thank you for your quick action! 🫶

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Nov 2, 2025
# 12.1.0 - 2025-10-29

- Optimized `buffer::append`, resulting in up to ~16% improvement on spdlog
  benchmarks (fmtlib/fmt#4541). Thanks @fyrsta7.

- Worked around an ABI incompatibility in `std::locale_ref` between clang and
  gcc (fmtlib/fmt#4573).

- Made `std::variant` and `std::expected` formatters work with `format_as`
  (fmtlib/fmt#4574,
  fmtlib/fmt#4575). Thanks @phprus.

- Made `fmt::join<string_view>` work with C++ modules
  (fmtlib/fmt#4379,
  fmtlib/fmt#4577). Thanks @Arghnews.

- Exported `fmt::is_compiled_string` and `operator""_cf` from the module
  (fmtlib/fmt#4544). Thanks @CrackedMatter.

- Fixed a compatibility issue with C++ modules in clang
  (fmtlib/fmt#4548). Thanks @tsarn.

- Added support for cv-qualified types to the `std::optional` formatter
  (fmtlib/fmt#4561,
  fmtlib/fmt#4562). Thanks @OleksandrKvl.

- Added demangling support (used in exception and `std::type_info` formatters)
  for libc++ and clang-cl
  (fmtlib/fmt#4542,
  fmtlib/fmt#4560,
  fmtlib/fmt#4568,
  fmtlib/fmt#4571).
  Thanks @FatihBAKIR and @rohitsutreja.

- Switched to global `malloc`/`free` to enable allocator customization
  (fmtlib/fmt#4569,
  fmtlib/fmt#4570). Thanks @rohitsutreja.

- Made the `FMT_USE_CONSTEVAL` macro configurable by users
  (fmtlib/fmt#4546). Thanks @SnapperTT.

- Fixed compilation with locales disabled in the header-only mode
  (fmtlib/fmt#4550).

- Fixed compilation with clang 21 and `-std=c++20`
  (fmtlib/fmt#4552).

- Fixed a dynamic linking issue with clang-cl
  (fmtlib/fmt#4576,
  fmtlib/fmt#4584). Thanks @FatihBAKIR.

- Fixed a warning suppression leakage on gcc
  (fmtlib/fmt#4588). Thanks @ZedThree.

- Made more internal color APIs `constexpr`
  (fmtlib/fmt#4581). Thanks @ishani.

- Fixed compatibility with clang as a host compiler for NVCC
  (fmtlib/fmt#4564). Thanks @valgur.

- Fixed various warnings and lint issues
  (fmtlib/fmt#4565,
  fmtlib/fmt#4572,
  fmtlib/fmt#4557).
  Thanks @LiangHuDream and @teruyamato0731.

- Improved documentation
  (fmtlib/fmt#4549,
  fmtlib/fmt#4551,
  fmtlib/fmt#4566,
  fmtlib/fmt#4567,
  fmtlib/fmt#4578,).
  Thanks @teruyamato0731, @petersteneteg and @zimmerman-dev.
polter-rnd added a commit to polter-rnd/slimlog that referenced this pull request Dec 15, 2025
polter-rnd added a commit to polter-rnd/slimlog that referenced this pull request Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants