Make WordPress Core

Opened 15 years ago

Last modified 7 months ago

#10483 reviewing enhancement

Change post_name's length from 200 to 400

Reported by: elnur's profile elnur Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: Future Release Priority: normal
Severity: minor Version:
Component: Permalinks Keywords: dev-feedback early has-patch needs-testing
Focuses: Cc:

Description

Hello, guys! Thank you very much for providing such a great piece of software! I love WordPress very much! :)

I use WordPress in Russian language and the URLs on my blog consist of Russian characters. There is a post with not such a long URL in Russian, but since it gets encoded to special characters it becomes too long to get fit into post_name field of post table.

I've found what code needs to be changed to increase the length. I make these changes every time a new version is released. I think it would be better to submit a patch here so that others people can benefit from it and I will not need to make those changes every release.

I'm attaching the patch to this ticket and asking you to apply it to the code.

Thank you very much again, guys! You do a great job! :)

Cheers,
Elnur

Attachments (5)

post_name_patch.diff (1.1 KB) - added by elnur 15 years ago.
post_name_patch_5_3.diff (1.0 KB) - added by mcmwebsol 5 years ago.
patch for 10453
post_name_len_increase_new.diff (996 bytes) - added by mcmwebsol 4 years ago.
Refreshed patch against trunk
10483.diff (996 bytes) - added by mcmwebsol 3 years ago.
Refreshed patch against trunk
10483_corrected.diff (1.0 KB) - added by mcmwebsol 3 years ago.
Refreshed patch against trunk - Corrected

Download all attachments as: .zip

Change History (66)

#1 @Denis-de-Bernardy
15 years ago

if my memory serves me well, the protocol actually assumes a uri is never longer than 255 chars.

#2 follow-up: @elnur
15 years ago

This link http://www.ielnur.com/blog/2009/05/%d1%81%d0%bd%d0%be%d0%b2%d0%b0-%d0%b1%d1%80%d0%be%d1%81%d0%b8%d1%82%d1%8c-%d0%ba%d1%83%d1%80%d0%b8%d1%82%d1%8c-30-%d1%82%d0%b8%d0%b4%d0%bd%d0%b5%d0%b2%d0%bd%d0%be%d0%b5-%d0%b8%d1%81%d0%bf%d1%8b%d1%82%d0%b0%d0%bd%d0%b8%d0%b5/ is 258 chars long, post_name part is 223. It works fine with our local search engine http://www.yandex.ru and with http://www.google.ru as well.

But I might be wrong in trying to change post_name's length to 400. I remembered that varchar can't hold more than 255 chars. Isn't that true?

#3 in reply to: ↑ 2 @Denis-de-Bernardy
15 years ago

Replying to elnur:

But I might be wrong in trying to change post_name's length to 400. I remembered that varchar can't hold more than 255 chars. Isn't that true?

yeah, until Mysql 5

#4 follow-up: @elnur
15 years ago

So, wouldn't someone apply this patch to the code? :)

#5 in reply to: ↑ 4 @Denis-de-Bernardy
15 years ago

  • Component changed from General to Permalinks
  • Milestone changed from 2.8.3 to Future Release
  • Owner set to ryan
  • Priority changed from normal to low
  • Severity changed from normal to minor

Replying to elnur:

So, wouldn't someone apply this patch to the code? :)

We, no... Not until MySQL 5 is the default, anyway. And then we'd need to worry about url length...

Punting to Future in the meanwhile.

#6 @elnur
15 years ago

But what if we limit it with 255 chars?

#7 @Denis-de-Bernardy
15 years ago

sure, but then there's the domain too, and the second point raised above.

#8 @solarissmoke
13 years ago

  • Keywords close added

close as maybelater?

#9 @RyanMurphy
13 years ago

  • Keywords dev-feedback added

Since we're going to MySQL5 in 3.2, can't this be considered for commit?

#10 @nacin
13 years ago

  • Milestone Future Release deleted
  • Resolution set to maybelater
  • Status changed from new to closed

We're not adjusting anything for MySQL 5 at this time. Closing as maybelater.

#11 @nacin
13 years ago

  • Keywords close removed
  • Milestone set to Future Release
  • Resolution maybelater deleted
  • Status changed from closed to reopened

Reopening for discussion.

#12 follow-up: @linuxologos
13 years ago

  • Cc linuxologos@… added

Thanks for giving the opportunity to discuss this.

This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.

#13 in reply to: ↑ 12 ; follow-up: @hakre
13 years ago

Replying to linuxologos:

Thanks for giving the opportunity to discuss this.

This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.

It's probably worth to drop the urlencoding then inside the storage layer. AFAIK MySQL should be able to store UTF8 in colums, so to have 200 true UTF8 characters instead of 38 to 200 urlencoded, subset of us-ascii ones.

The related refactorings could benefit the overall UTF8 support of the application as a bonus.

#14 in reply to: ↑ 13 ; follow-up: @linuxologos
13 years ago

Replying to hakre:

Replying to linuxologos:

Thanks for giving the opportunity to discuss this.

This is quite a big problem for languages with an alphabet totally different from English. We can't have a post_name with more than ~38 letters, since every letter is urlencoded to be stored in the database, so every letter is converted into many more characters, dramatically cutting down the maximum possible length of the "real" post-name.

It's probably worth to drop the urlencoding then inside the storage layer. AFAIK MySQL should be able to store UTF8 in colums, so to have 200 true UTF8 characters instead of 38 to 200 urlencoded, subset of us-ascii ones.

The related refactorings could benefit the overall UTF8 support of the application as a bonus.

MySQL is able to store UTF8 indeed and that is already the fact for post_content and post_title in (wp_)posts table. They don't get urlencoded before stored in the db. post_name is urlencoded though, and I'm not sure if it's technically safe to alter this.

#15 in reply to: ↑ 14 @hakre
13 years ago

Replying to linuxologos:

Replying to hakre:

Replying to linuxologos:

[...]

MySQL is able to store UTF8 indeed and that is already the fact for post_content and post_title in (wp_)posts table. They don't get urlencoded before stored in the db. post_name is urlencoded though, and I'm not sure if it's technically safe to alter this.

I have not said that this is a trivial change and in fact, I can not even say if the project would be able to perform such changes and a refactoring properly at all.


Replying to Denis-de-Bernardy:

if my memory serves me well, the protocol actually assumes a URI is never longer than 255 chars.

Indeed, RFC 2616 suggests to avoid URIs longer than 255 chars:

The HTTP protocol does not place any a priori limit on the length of
a URI. Servers MUST be able to handle the URI of any resource they
serve, and SHOULD be able to handle URIs of unbounded length if they
provide GET-based forms that could generate such URIs. A server
SHOULD return 414 (Request-URI Too Long) status if a URI is longer
than the server can handle (see section 10.4.15).

Note: Servers ought to be cautious about depending on URI lengths
above 255 bytes, because some older client or proxy
implementations might not properly support these lengths.

from: 3.2.1 General Syntax

Next to that 255 char limit by caution, there is a physical one for the browsers. Microsoft Internet Explorer is introducing the lowest limit which is a little bit up to 2000 characters according to WWW FAQs: What is the maximum length of a URL?. Generally these lengths relate to one-char = one-byte in us-ASCII encoding of an (urlencoded) URL, a subset of URI.

I think the 414 response is something WP don't do so far, which is classified as SHOULD. I have no idea about the overall parameters this is related to, I think those are undocumented so far which need to reveal those from the code-base first before coping with that problem which is out of the scope of this ticket as well.

#16 follow-up: @ldebrouwer
13 years ago

  • Cc info@… added

I once did a clean URL conversion from the cyrillic alphabet to the 'regular' alphabet in a different CMS. 'Дистрибьюторы' would become 'distributori' as a URL slug. This would solve the problem because I believe the latter is still valid Russian. To me this seems a good solution because this can also be applied to other character sets like ancient Greek. Please let me know if this is desired because then I will look into a WordPress patch for it.

#17 @SergeyBiryukov
13 years ago

  • Keywords needs-patch added

Related: #16230

#18 in reply to: ↑ 16 ; follow-up: @SergeyBiryukov
13 years ago

Replying to ldebrouwer:

I once did a clean URL conversion from the cyrillic alphabet to the 'regular' alphabet in a different CMS. 'Дистрибьюторы' would become 'distributori' as a URL slug. This would solve the problem because I believe the latter is still valid Russian.

It's not Cyrillic, so it's not valid Russian. Transliteration is acceptable for some people (including me), and there are some plugins which transliterate post and term slugs, but it's only a workaround and not a long-term solution for everyone.

The best solution here would be to store slugs as is, not in urlencoded form, since it only allows 33 chars for non-English slugs, which is noticeably less than the original 200 characters limit.

The problem is not only the length, though. Create two posts on 3.3-trunk with the same title:

Предлагаем супер металлообрабатывающее оборудование

The first one will just have a truncated slug:

предлагаем-супер-металлообрабатываю

But the second one will have a broken slug:

предлагаем-супер-металлообрабатыва�%-2

#20 @ryan
12 years ago

http://dev.mysql.com/doc/refman/5.0/en/string-type-overview.html

"In MySQL 5.0, the range of M is 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in MySQL 5.0.3 and later."

$required_mysql_version = '5.0';

#21 follow-up: @nacin
12 years ago

As discussed during the 3.2 cycle, we had aimed to choose 5.0.22. Version 5.0.15 was the first production version, and 5.0.22 was the first with any real usage. We didn't push it because we did not identify any version-specific things we would have wanted.

42.5% of all installs are on 5.0. However, our stats show that there are 59 total installs on 5.0.0, 5.0.1, and 5.0.2. And that number is probably inflated, based on how our stats collection works. I think it would be safe to bump the required version up a bit.

#22 in reply to: ↑ 18 @SergeyBiryukov
12 years ago

Replying to SergeyBiryukov:

But the second one will have a broken slug:

предлагаем-супер-металлообрабатыва�%-2

Related: #21013

This ticket was mentioned in IRC in #wordpress-dev by johnbillion. View the logs.


10 years ago

This ticket was mentioned in IRC in #wordpress-dev by simonwheatley. View the logs.


10 years ago

#25 @ryan
10 years ago

  • Owner ryan deleted
  • Status changed from reopened to assigned

#26 @archon810
9 years ago

I'm a bit confused. WP 4.2 changes the posts table and specifically the index on post_name to be 191 long, (which takes a long time on large tables, which is why I noticed). Reading over the reasoning, I was expecting the post_name field to be 400 long, but I still see it as 200 on all upgraded 4.2 instances.

Should I be expecting the field name to change to 400 on existing tables or is this only for new ones created from 4.2 on?

#27 in reply to: ↑ 21 ; follow-up: @netweb
8 years ago

Replying to nacin:

42.5% of all installs are on 5.0. However, our stats show that there are 59 total installs on 5.0.0, 5.0.1, and 5.0.2. And that number is probably inflated, based on how our stats collection works. I think it would be safe to bump the required version up a bit.

MySQL 5.0.x today is at 4.2% https://wordpress.org/about/stats/

Replying to Denis-de-Bernardy:

Replying to elnur:

So, wouldn't someone apply this patch to the code? :)

We, no... Not until MySQL 5 is the default, anyway. And then we'd need to worry about url length...

Punting to Future in the meanwhile.

Is now the future?

Related: #meta1884

#28 @SergeyBiryukov
8 years ago

  • Keywords early added
  • Milestone changed from Future Release to 4.7
  • Owner set to SergeyBiryukov
  • Priority changed from low to normal
  • Status changed from assigned to reviewing

#29 in reply to: ↑ 27 ; follow-up: @dd32
8 years ago

Replying to netweb:

Replying to nacin:

42.5% of all installs are on 5.0. However, our stats show that there are 59 total installs on 5.0.0, 5.0.1, and 5.0.2. And that number is probably inflated, based on how our stats collection works. I think it would be safe to bump the required version up a bit.

MySQL 5.0.x today is at 4.2% https://wordpress.org/about/stats/
..
Is now the future?

Although MySQL 5.0.x is only 4.2%, the install count is still in the millions.

MySQL 5.0.3 was the first to support > 255char for VARCHAR.
Stats currently show there are 3 sites on MySQL 5.0.2 (running WordPress 2.8 & 3.0.1), and 36 sites on < MySQL 5.0.15 (Ranges from WordPress 2.8 ~ 4.5.3). To clarify, We have no sites which are recorded as currently running MySQL 5.0.0/5.0.1/5.0.3 in the WordPress.org stats.

If we wanted to bump the requirements in order to get >255 char for VARCHAR, then it's currently safe to bump it to 5.0.4 without any real user impact.

#30 @SergeyBiryukov
8 years ago

The suggested limit of 400 seems still not enough, as it only allows up to 66 encoded characters. Would like to explore our options here.

#31 @dd32
8 years ago

It's also worth remembering that we're limited by the 767-byte index limits, so post_name can only ever be indexed by the first 767bytes / 4bytes in utf8mb4 = 191 characters, so for encoded character urls, only by the first 31 characters if they're stored encoded in the database.

#32 @netweb
8 years ago

Quoting SergeyBiryukov in #meta1884]:

For a language where most words are longer than in English, this is often insufficient. English topic slugs, on the other hand, allow up to 80 characters (which is the maxlength attribute on the input).

500 (rounded up from 484) would allow for 80 characters, kind of bringing it line with the above English max of 80.

#33 @SergeyBiryukov
8 years ago

80 was a reference to the topic title input on support forums, it looks like the title input on Edit Post screen doesn't have a maxlength attribute at all. 80 is obviously better than 33 though :)

This ticket was mentioned in Slack in #core by helen. View the logs.


8 years ago

#35 @jorbin
8 years ago

@SergeyBiryukov Still hoping to get something done for 4.7? I'm worrying that the window for something like this is coming to a close.

This ticket was mentioned in Slack in #core by chriscct7. View the logs.


8 years ago

#37 @chriscct7
8 years ago

  • Milestone changed from 4.7 to Future Release

#38 @chriscct7
8 years ago

Punted per bugscrub discussion

#39 follow-up: @savvadesogle
7 years ago

Hello, there is full instruction on russian lang. How to change url length in WordPress for cyrillic or IDN domains.

Всем русскоговорящим привет! Здесь есть подробная инструкция по изменению длины URL в Wordpress

Вот решение: https://ardua.ru/cyrillic-length-url
Все правится ручками.
Два файла formatting.php и schema.php и изменение размера поля в БД (phpmyadmin или SQL запрос)

Last edited 7 years ago by savvadesogle (previous) (diff)

#40 @antithesisgr
7 years ago

Hi..

For anyone interested, just try this:

https://wordpress.org/plugins/longer-permalinks/

#41 in reply to: ↑ 39 @mrvigner
7 years ago

Replying to savvadesogle:

Hello, there is full instruction on russian lang. How to change url length in WordPress for cyrillic or IDN domains.

Всем русскоговорящим привет! Здесь есть подробная инструкц��я по изменению длины URL в Wordpress

Вот решение: https://ardua.ru/cyrillic-length-url
Все правится ручками.
Два файла formatting.php и schema.php и изменение размера поля в БД (phpmyadmin или SQL запрос)

А можно ли сделать чтобы ВП записывал в базу урл на кирилице, а не конвертировал его? Из-за этой конвертации когда ставишь через редактор ссылку на внутреннюю страницу вместо /сайт.рф/адрес-страницы-поста/ получается такое /%d1%81%d0%b0%d0%bb%d0%b0%d1%82-%d0%b7-%d0%ba%d1%83%d1%80%d0%ba%d0%b8-%d1%96-%d1%88%d0%b0%d0%bc%d0%bf%d1%96%d0%bd%d1%8c%d0%b9%d0%be%d0%bd%d1%96%d0%b2/ . В этом же формате урл ставится в rel="canonical".
---
Пытался написать этот в комментариях на вашем сайте, но там постоянно ругается что неверная каптча, хотя все правильно отвечал.

#42 in reply to: ↑ 29 ; follow-up: @netweb
7 years ago

Replying to dd32:

Although MySQL 5.0.x is only 4.2%, the install count is still in the millions.

MySQL 5.0.3 was the first to support > 255char for VARCHAR.
Stats currently show there are 3 sites on MySQL 5.0.2 (running WordPress 2.8 & 3.0.1), and 36 sites on < MySQL 5.0.15 (Ranges from WordPress 2.8 ~ 4.5.3). To clarify, We have no sites which are recorded as currently running MySQL 5.0.0/5.0.1/5.0.3 in the WordPress.org stats.

If we wanted to bump the requirements in order to get >255 char for VARCHAR, then it's currently safe to bump it to 5.0.4 without any real user impact.

I see MySQL 5.0 is now down to 2.2%

@dd32 Hows the extended stats look that you have access to, could we bump this for WP 5.0?

#43 in reply to: ↑ 42 ; follow-up: @dd32
7 years ago

Replying to netweb:

I see MySQL 5.0 is now down to 2.2%

@dd32 Hows the extended stats look that you have access to, could we bump this for WP 5.0?

@netweb:
MySQL 5.0 is at ~1.0% for WP 4.8+4.9.
MySQL 5.1 is at ~2.5% for WP 4.8+4.9.

In order to take baby steps forward, I've dived into the data to figure out what the minimum version numbers we can update to without even thinking twice about it.

  • 5.0.90~5.0.96 accounts for 85% of sites running MySQL 5.0.
  • 5.0.44+ is where the majority of sites are who aren't running the latest releases starts - That would account for 98% of the 1.0% running MySQL 5.0.x
  • 5.0.16+ is the next major point worth mentioning. At this point the percentage of sites I'm talking about is so small it's not worth calculating.
  • 5.0~5.0.15 is used by a handful of recent sites, but mostly WP 2.x/3.x.

If we want to bump the MySQL version, just because we can, pick a version - 5.0.16 or 5.0.44 would be my suggestion - either of them would allow this ticket to be fixed. It just depends on what bugs/features were added during the MySQL 5.0 cycle.

Bumping to 5.0.16 is a no-brainer. Bumping to 5.0.44 might affect a few people, but I doubt you'd hear about it. I think if we accidentally started using a 5.0.44 feature and didn't bump the requirements, we'd probably not even hear about it.

If we want to drop support for MySQL 5.0, I'd ask for a feature that 5.1 has that we need. There's enough sites on 5.0.90+ which makes keeping 5.0 support worthwhile, for now.

#44 in reply to: ↑ 43 @netweb
7 years ago

  • Milestone changed from Future Release to 5.0

Replying to dd32:

Bumping to 5.0.16 is a no-brainer. Bumping to 5.0.44 might affect a few people, but I doubt you'd hear about it. I think if we accidentally started using a 5.0.44 feature and didn't bump the requirements, we'd probably not even hear about it.

Thanks for these stats @dd32 :+1:

Let's bump it to 5.0.16 for WP 5.0, that gets this ticket fixed with the lowest impact

Just as the Browse Happy and Server Happy teams that came before, the Query Happy team will be just taking off and ready to bump the MySQL minimum required version to 5.0.44 for WordPress 5.1...

In writing the above in jest, it's not the worst idea I've had today...

#45 @SergeyBiryukov
6 years ago

  • Milestone changed from 5.0 to 5.1

#46 @pento
6 years ago

  • Milestone changed from 5.1 to 5.2

Per this post, we're bumping the minimum MySQL version to 5.5 in April, which will probably by WordPress 5.2.

#47 @desrosj
5 years ago

  • Milestone changed from 5.2 to 5.3

This ticket has not received any attention during the 5.2 cycle. With beta 1 tomorrow, going to punt this to 5.3, especially since it is marked early.

@mcmwebsol
5 years ago

patch for 10453

#48 @mcmwebsol
5 years ago

  • Keywords has-patch added; needs-patch removed

#49 @davidbaumwald
5 years ago

  • Keywords needs-testing added

This ticket was mentioned in Slack in #core by mcmwebsol. View the logs.


5 years ago

#51 @davidbaumwald
5 years ago

@SergeyBiryukov The latest patch on this one is still good. Can this be considered for 5.3 Beta 1 landing tomorrow?

#52 @davidbaumwald
5 years ago

  • Milestone changed from 5.3 to Future Release

With version 5.3 Beta 1 releasing shortly, the deadline for enhancements is now passed. This is being moved to Future Release. @SergeyBiryukov if you feel this can be included in 5.4, feel free to move up the milestone.

#53 @robert7k
4 years ago

Bump!
Mysql 5.0 is getting even more rare, Wordpress 5.3 had been already released, 5.4 too...
Any plans to ever include this patch?

Last edited 4 years ago by robert7k (previous) (diff)

#54 follow-up: @mcmwebsol
4 years ago

I'm afraid it's too late for WordPress 5.5.

#55 in reply to: ↑ 54 @francina
4 years ago

Replying to mcmwebsol:

I'm afraid it's too late for WordPress 5.5.

Is it though? @SergeyBiryukov are you still interested in working on this? I am happy to test it.

This ticket was mentioned in Slack in #core by hareesh-pillai. View the logs.


4 years ago

#57 @audrasjb
4 years ago

  • Keywords needs-refresh added

The current patch needs a small refresh against trunk :)

@mcmwebsol
4 years ago

Refreshed patch against trunk

#58 @mcmwebsol
4 years ago

  • Keywords needs-refresh removed

#59 @mcmwebsol
3 years ago

  • Keywords needs-refresh added

@mcmwebsol
3 years ago

Refreshed patch against trunk

#60 @mcmwebsol
3 years ago

  • Keywords needs-refresh removed

@mcmwebsol
3 years ago

Refreshed patch against trunk - Corrected

#61 @sabernhardt
7 months ago

#55730 was marked as a duplicate.

Note: See TracTickets for help on using tickets.