Make WordPress Core

Opened 10 months ago

Last modified 6 months ago

#59608 new defect (bug)

[bug] insert_with_markers with a UTF-8 translated marker causing .htaccess file broken with garbled characters

Reported by: sammyhk's profile sammyhk Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 6.3.2
Component: I18N Keywords:
Focuses: Cc:

Description

We encountered issue when the site language is setting to Chinese (Hong Kong), the translators: 1: Marker. have a translation (https://translate.wordpress.org/projects/wp/dev/admin/zh-hk/default/?filters%5Boriginal_id%5D=8431526&filters%5Bstatus%5D=either&filters%5Btranslation_id%5D=91677098) which contains UTF-8 characters. When calling insert_with_markers(...) to write values to .htaccess file, the .htaccess file will contains garbled characters causing Apache cannot parse the file and return HTTP 500.

Sample of the broken .htaccess:

# BEGIN WP Cloudflare Super Page Cache
# 在含有 BEGIN WP Cloudflare Super Page Cache 及 END WP Cloudflare Super Page Cache 標記的這�
�行間的指示詞�
�容為動�
�產生,
# 且應�
有 WordPress 篩選器能進行修改。對這�
�行間任何指示詞�
�容的變更,
# 都會遭到系統覆寫。
<IfModule mod_expires.c>
...

The expected output should be:

# BEGIN WP Cloudflare Super Page Cache
# 在含有 BEGIN WP Cloudflare Super Page Cache 及 END WP Cloudflare Super Page Cache 標記的這兩行間的指示詞內容為動態產生,
# 且應僅有 WordPress 篩選器能進行修改。對這兩行間任何指示詞內容的變更,
# 都會遭到系統覆寫。
<IfModule mod_expires.c>
...

Ref: https://build.trac.wordpress.org/browser/trunk/wp-admin/includes/misc.php?marks=141#L140

Change History (2)

#1 @sabernhardt
9 months ago

  • Component changed from General to I18N
  • Version changed from 6.3.3 to 6.3.2

This could belong in the Rewrite Rules component, but it also might relate to switching the locale.

#2 @hh3
6 months ago

I also am having this issue with version 6.4.2 in Chinese (Taiwan). The problem is strange in that only some characters causes the problem. I have tried manually inserting the Chinese character into the misc.php code.

Changing this code:

$instructions = sprintf(
		__(
'The directives (lines) between "BEGIN %1$s" and "END %1$s" are
dynamically generated, and should only be modified via WordPress filters.
Any changes to the directives between these markers will be overwritten.'
		),
		$marker
	);

to this would break the htaccess

$instructions = '# 兩';

but to this would not

$instructions = '# 我';

my php.ini default_charset = "UTF-8", both input_encoding and output_encoding are not set.

Another test case I tried was to manually insert # 兩 somewhere else in the .htaccess. This also causes the problem after insert_with_markers is executed. After reading through the code, it seems the code is reading and re-writing the entire file. This seem to indicate to me that the problem occurs when the strings are re-written into .htaccess.

Forcing the code to do mb_convert_encoding($line, 'UTF-8') on every single line also doesn't seem to work as suggested on multiple posts on StackExchange.

I have tried to also force the code to write the UTF-8 BOM at the beginning of the file, but Apache also fail with http 500 upon reading the BOM.

Lastly, I copied out the function, wrote it as a stand-alone php file to execute on the command line in the server environment. This does NOT reproduce the problem, and results in a useable .htaccess file. I had to disable switch_to_locale, but this didn't seem to make any difference even if I disabled it in misc.php.

So there must be something different about executing this code locally or via CGI. phpinfo did not reveal any difference in default_charset and other encoding values.

Last edited 6 months ago by hh3 (previous) (diff)
Note: See TracTickets for help on using tickets.