feat: add CJK friendly emphasis extension by 3w36zj6 · Pull Request #1059 · pulldown-cmark/pulldown-cmark

3w36zj6 · 2025-11-02T06:14:42Z

CommonMark has a problem that the following emphasis marks ** are not recognized as emphasis marks in CJK.

**このアスタリスクは強調記号として認識されず、そのまま表示されます。**この文のせいで。

**该星号不会被识别，而是直接显示。**这是因为它没有被识别为强调符号。

**이 별표는 강조 표시로 인식되지 않고 그대로 표시됩니다(이 괄호 때문에)**이 문장 때문에.

This pull request introduces support for CJK-friendly emphasis handling in the Markdown parser, aligning with the CommonMark CJK-friendly amendments specification.

It adds a new option to enable CJK-friendly emphasis parsing, updates the delimiter run logic to properly handle CJK characters and punctuation, and includes comprehensive tests to verify the new behavior. By default, the feature is disabled to maintain backward compatibility.

In addition to the specification, I also refer to the Tips for Implementers and Concrete ranges of each terms in tats-u/markdown-cjk-friendly for implementation.

ollpu · 2025-11-03T14:58:07Z

pulldown-cmark/src/firstpass.rs

+#[inline]
+fn previous_two_chars(s: &str, ix: usize) -> (Option<char>, Option<char>) {
+    let mut iter = s[..ix].chars();
+    let mut prev_prev = None;
+    let mut prev = None;
+    while let Some(ch) = iter.next() {
+        prev_prev = prev;
+        prev = Some(ch);
+    }
+    (prev, prev_prev)
+}


This iterates through the full string, which makes emphasis parsing O(n^2), as caught by CI.

The previous implementation uses .chars().last(), which takes advantage of DoubleEndedIterator. Also, I wouldn't put #[inline] on an internal function unless a benchmark indicates it helps.

Suggested change

#[inline]

fn previous_two_chars(s: &str, ix: usize) -> (Option<char>, Option<char>) {

let mut iter = s[..ix].chars();

let mut prev_prev = None;

let mut prev = None;

while let Some(ch) = iter.next() {

prev_prev = prev;

prev = Some(ch);

}

(prev, prev_prev)

}

fn previous_two_chars(s: &str, ix: usize) -> (Option<char>, Option<char>) {

let mut iter = s[..ix].chars().rev();

let prev = iter.next();

let prev_prev = iter.next();

(prev, prev_prev)

}

Related, maybe this should not take ix, so that it's the caller's responsibility to slice the string.

…g the iterator

ollpu reviewed Nov 3, 2025

View reviewed changes

3w36zj6 requested a review from ollpu November 4, 2025 00:15

3w36zj6 added 4 commits November 28, 2025 19:05

feat: add CJK friendly emphasis extension

8257cac

fix: remove unnecessary inline annotations

7612183

perf: optimize utility for retrieving last two characters by reversin…

ae9bae0

…g the iterator

refactor: accept prefix string directly for character retrieval

ccdd8e0

3w36zj6 force-pushed the feature/add-cjk-friendly-emphasis-extension branch from f721066 to ccdd8e0 Compare November 28, 2025 10:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add CJK friendly emphasis extension#1059

feat: add CJK friendly emphasis extension#1059
3w36zj6 wants to merge 4 commits intopulldown-cmark:mainfrom
3w36zj6:feature/add-cjk-friendly-emphasis-extension

3w36zj6 commented Nov 2, 2025

Uh oh!

ollpu Nov 3, 2025

Uh oh!

ollpu Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

3w36zj6 commented Nov 2, 2025

Uh oh!

ollpu Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

ollpu Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants