Make WordPress Core

Opened 19 months ago

Closed 18 months ago

Last modified 17 months ago

#57373 closed defect (bug) (fixed)

WP_Query loading posts twice.

Reported by: spacedmonkey's profile spacedmonkey Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.2 Priority: normal
Severity: normal Version: 6.1
Component: Query Keywords: has-patch needs-dev-note
Focuses: performance Cc:

Description

So in [54352] how posts are primed in caches are changed. This introduced a bug. In many cases, posts are primed using split query. This results in posts being primed using _prime_post_caches and a split query. But for queries that are unlimited, like if you pass post_per_page = -1. Posts are loaded but not primed in caches. So this results in the following.

  1. Post objects are loading via a database query to select * from wp_posts.
  2. Then in line 3459, _prime_post_cache, is called again, doing another database query.

This is wasteful, as the posts are already in memory and should referenced like that.

Attachments (3)

Screenshot 2023-01-04 at 23.00.09.png (8.7 KB) - added by spacedmonkey 19 months ago.
Option 1
Screenshot 2023-01-04 at 23.00.35.png (8.1 KB) - added by spacedmonkey 19 months ago.
Option 2
Screenshot 2023-01-04 at 22.51.04.png (9.0 KB) - added by spacedmonkey 19 months ago.
Option 3

Download all attachments as: .zip

Change History (23)

#4 @spacedmonkey
19 months ago

After spending some time doing profiling, here is the results.

Option 1.

Most complex solution but results in the best results.

PR #3793

https://core.trac.wordpress.org/raw-attachment/ticket/57373/Screenshot%202023-01-04%20at%2023.00.09.png

Option 2.

Clever solution, but it results in updating caches if not needed but then going another get of all the posts. This is wasteful.

PR #3811

https://core.trac.wordpress.org/raw-attachment/ticket/57373/Screenshot%202023-01-04%20at%2023.00.35.png

Option 3.

Simplest solution, as it is a revert of the logic. However, it does mean more writes to caches that are unneeded. This results in a performance lose.

PR #3815

https://core.trac.wordpress.org/raw-attachment/ticket/57373/Screenshot%202023-01-04%20at%2022.51.04.png

#5 follow-up: @peterwilsoncc
19 months ago

I'm inclined to go with the simplest approach for the minor release and revert back (option 3).

The test failures on PR#3815 show that the unnecessary database request is no longer made.

#6 in reply to: ↑ 5 @spacedmonkey
19 months ago

Replying to peterwilsoncc:

I'm inclined to go with the simplest approach for the minor release and revert back (option 3).

I agree, I will create another issue, for the double cache writes issue.

The test failures on PR#3815 show that the unnecessary database request is no longer made.

The PR should have tests working. If you are happy @peterwilsoncc I will commit tomorrow.

#7 @flixos90
19 months ago

Left some feedback in https://github.com/WordPress/wordpress-develop/pull/3793.

If you agree with that, I think this PR is not all that complex, so in that case to me it becomes a more realistic candidate to apply even for 6.1.2. Right now it's just a lot of duplicate code which makes it look more complicated than it is I'd say.

So I would vote for option 1.

Last edited 19 months ago by flixos90 (previous) (diff)

#8 @peterwilsoncc
19 months ago

  • Owner set to spacedmonkey
  • Status changed from new to assigned

I think the option 3 PR looks good for 6.1.2 as of 0fb848b453ded2530c44b022c9da2705e1e94a43

@flixos90 My inclination it to revert the regression. As you say, the solution may be elsewhere so it would be good for those of us working on the ticket (Jonny and I most likely) to sort out a few ideas beforehand.

#9 @spacedmonkey
19 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 55035:

Query: Stop priming posts twice in WP_Query.

Part revert of [54352]. In cases where queries are not split, then it results in priming posts that are already loaded resulting in a superfluous database query.

Props spacedmonkey, flixos90, peterwilsoncc.
Fixes #57373.

#12 @spacedmonkey
19 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for backport.

#13 @david.binda
19 months ago

Just wanted to share some observations from my testing of the r55035 on WordPress.com.

While the changeset may reduce the number of SQL queries under the conditions mentioned in this ticket, there is also a side-effect which may lead to increased memory consumption as the $update_meta_cache param (set in WP_Query via update_post_meta_cache param) behaves differently in _prime_post_caches and update_post_caches functions.

The _prime_post_caches function takes a list of post IDs, checks whether the post objects are cached (uses the _get_non_cached_ids function with posts passed as $cache_key param), and calls the update_post_caches function for non-cached post objects only (freshly fetched from the database, which is the source of the extra SQL query).

The bit about filtering out post IDs of which are cached is what makes the difference in between the _prime_post_caches and update_post_caches, as the _get_non_cached_ids function only checks a presence of key-value pairs in a single cache group. In this case it's the posts group.

However, such a check does not tell anything about the state of the cache of the post meta (and term meta) of related post objects. The post meta for a post object does not have to be in the cache despite the post object itself being there. It may be due to evictions when the persistent object cache backend is used or due to update_post_caches function being called previously with $update_meta_cache param set to false.

The update_posts_caches updates all the caches (post object, post meta, and post terms) independently on the state of any of those in the cache, ultimately leading to more data being fetched either from database of persistent cache backend.

That said, I'm seeing an increased memory consumption with the r55035 being applied. And while it feels like the update_posts_caches functionality is more correct, especially when the $update_meta_cache param is set to true (eg.: when update_post_meta_cache param of the WP_Query is used), I'm not sure whether this has been taken into consideration in terms of this ticket.

To illustrate the real world impact of the issue I'm describing, a request which has been working fine previously now, with r55035 applied, may end up with an out of memory fatal error due to attempting to load a lot of posts and cache their post metas via WP_Query.

#14 @peterwilsoncc
19 months ago

Hi @davidbinda

When updating the post meta cache, ultimately update_meta_cache() is called which includes a check for non-cached IDs about half way through the function (see source code).

For term caches, update_object_term_cache() is used and includes a similar check, see the source code.

However, you are right, in that the post object caches (ie, the post table) are written regardless of whether they are warm or not. Could that be the single cause of the increased memory footprint you are seeing?

#15 @david.binda
19 months ago

Hey Peter! I'm not sure if the post object caches being written regardless of whether they are warm or not is the cause of the increased memory footprint.

Let me illustrate the behaviour with an example.

On my local WordPress installation (using the VVV envrionment) with memcached enabled, I can see following behaviour after warming up the memcache. All the following code has been executed in a single WP CLI's shell.

$ wp shell
wp> get_post( 1 ); // Warms up the cache.
wp> global $wp_object_cache;
wp> $wp_object_cache->cache = []; // Cleans up the in-process cache for better readability.
wp> _prime_post_caches( [ 1 ], false, true ); // Does not populate the post meta in-process cache.
wp> $wp_object_cache->cache
=> array(1) {
  [";m-5D,iJF-Mw)[Tyqe0M^sxEU:_xgOC=N&d3Urn[z{d9FI0D0ExTtJ&Q/r}>0#wj:1673948700806564:wp_:posts:1"]=>
  array(2) {
    ["value"]=>
    object(stdClass)#2264 (24) {
      ["ID"]=>
      int(1)
      ["post_author"]=>
      string(1) "1"
      ["post_date"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_date_gmt"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_content"]=>
      string(137) "<!-- wp:paragraph -->
<p>Welcome to WordPress. This is your first post. Edit or delete it, then start writing!</p>
<!-- /wp:paragraph -->"
      ["post_title"]=>
      string(12) "Hello world!"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(4) "open"
      ["ping_status"]=>
      string(4) "open"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(11) "hello-world"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_modified_gmt"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(30) "http://two.wordpress.test/?p=1"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(4) "post"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "1"
      ["filter"]=>
      string(3) "raw"
    }
    ["found"]=>
    bool(true)
  }
}

The in-process cache for the post meta is not populated as the post object is already stored in memcache server, however the post object is now stored in the in-process cache. Any repeated call to _prime_post_caches( [ 1 ], false, true ) won't bring any meta to the in-process cache, thus it is not affecting the consumed memory (it's all due to the result of the _get_non_cached_ids function call ).

Just for illustration that there is no further magic down there: Calling the get_post( 1 ) does not affect the in-process cache neither:

wp> $post = get_post(1);
=> object(WP_Post)#2270 (24) {
...
}
wp> $wp_object_cache->cache
=> array(1) {
  [";m-5D,iJF-Mw)[Tyqe0M^sxEU:_xgOC=N&d3Urn[z{d9FI0D0ExTtJ&Q/r}>0#wj:1673948700806564:wp_:posts:1"]=>
  array(2) {
    ["value"]=>
    object(stdClass)#2264 (24) {
      ["ID"]=>
      int(1)
      ["post_author"]=>
      string(1) "1"
      ["post_date"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_date_gmt"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_content"]=>
      string(137) "<!-- wp:paragraph -->
<p>Welcome to WordPress. This is your first post. Edit or delete it, then start writing!</p>
<!-- /wp:paragraph -->"
      ["post_title"]=>
      string(12) "Hello world!"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(4) "open"
      ["ping_status"]=>
      string(4) "open"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(11) "hello-world"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_modified_gmt"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(30) "http://two.wordpress.test/?p=1"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(4) "post"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "1"
      ["filter"]=>
      string(3) "raw"
    }
    ["found"]=>
    bool(true)
  }
}

However, in case I run update_post_caches I get the post meta stored in the in-process cache, despite the logic you are mentioning in your comment:

wp> $posts = [ $post ];
wp> update_post_caches( $posts, false, true );
=> NULL
wp> $wp_object_cache->cache
=> array(5) {
  [";m-5D,iJF-Mw)[Tyqe0M^sxEU:_xgOC=N&d3Urn[z{d9FI0D0ExTtJ&Q/r}>0#wj:1673948700806564:wp_:posts:1"]=>
  array(2) {
    ["value"]=>
    object(stdClass)#2407 (24) {
      ["ID"]=>
      int(1)
      ["post_author"]=>
      string(1) "1"
      ["post_date"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_date_gmt"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_content"]=>
      string(137) "<!-- wp:paragraph -->
<p>Welcome to WordPress. This is your first post. Edit or delete it, then start writing!</p>
<!-- /wp:paragraph -->"
      ["post_title"]=>
      string(12) "Hello world!"
      ["post_excerpt"]=>
      string(0) ""
      ["post_status"]=>
      string(7) "publish"
      ["comment_status"]=>
      string(4) "open"
      ["ping_status"]=>
      string(4) "open"
      ["post_password"]=>
      string(0) ""
      ["post_name"]=>
      string(11) "hello-world"
      ["to_ping"]=>
      string(0) ""
      ["pinged"]=>
      string(0) ""
      ["post_modified"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_modified_gmt"]=>
      string(19) "2022-09-13 14:48:41"
      ["post_content_filtered"]=>
      string(0) ""
      ["post_parent"]=>
      int(0)
      ["guid"]=>
      string(30) "http://two.wordpress.test/?p=1"
      ["menu_order"]=>
      int(0)
      ["post_type"]=>
      string(4) "post"
      ["post_mime_type"]=>
      string(0) ""
      ["comment_count"]=>
      string(1) "1"
      ["filter"]=>
      string(3) "raw"
    }
    ["found"]=>
    bool(true)
  }
  [";m-5D,iJF-Mw)[Tyqe0M^sxEU:_xgOC=N&d3Urn[z{d9FI0D0ExTtJ&Q/r}>0#wj:1673948700806564:wp_:category_relationships:1"]=>
  array(2) {
    ["value"]=>
    array(1) {
      [0]=>
      int(1)
    }
    ["found"]=>
    bool(true)
  }
  [";m-5D,iJF-Mw)[Tyqe0M^sxEU:_xgOC=N&d3Urn[z{d9FI0D0ExTtJ&Q/r}>0#wj:1673948700806564:wp_:post_tag_relationships:1"]=>
  array(2) {
    ["value"]=>
    array(0) {
    }
    ["found"]=>
    bool(true)
  }
  [";m-5D,iJF-Mw)[Tyqe0M^sxEU:_xgOC=N&d3Urn[z{d9FI0D0ExTtJ&Q/r}>0#wj:1673948700806564:wp_:post_format_relationships:1"]=>
  array(2) {
    ["value"]=>
    array(0) {
    }
    ["found"]=>
    bool(true)
  }
  [";m-5D,iJF-Mw)[Tyqe0M^sxEU:_xgOC=N&d3Urn[z{d9FI0D0ExTtJ&Q/r}>0#wj:1673948700806564:wp_:post_meta:1"]=>
  array(2) {
    ["value"]=>
    array(0) {
    }
    ["found"]=>
    bool(true)
  }
}

The difference in the behaviour is what is increasing the memory consumption when the r55035 is in place (as it replaces the call to _prime_post_caches for update_post_caches).

The example shown above follows the similar logic as if the code were executed in the WP_Query where the results of the WP_Query were cached. The posts are primed via _prime_post_caches in https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/class-wp-query.php#L3137 and then all the caches are added via update_post_caches in https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/class-wp-query.php#L3479 . Previously the _prime_post_caches was used (subject of the r55035 ).

I'd like to point out that without the warm caches the OOM fatals I'm mentioning in my original comment would still happen, but the changeset makes them happen even with the caches being warm, and warming up the caches does not make the OOM to go away.

IMHO, the current behaviour (with r55035 applied) is more correct, as it properly respect the update_post_meta_cache param of the WP_Query, but the memory impact can be significant in certain situations. So I'd like to make sure this aspect of the changeset is taken into consideration.

Please, let me know if it makes the point I made cleaner.

#16 @flixos90
18 months ago

  • Keywords fixed-major added

#17 @spacedmonkey
18 months ago

  • Keywords fixed-major removed
  • Milestone changed from 6.1.2 to 6.2

Moving this to 6.2, as this is more likely to be released first.

#18 @spacedmonkey
18 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

#19 @milana_cap
17 months ago

  • Keywords needs-dev-note added

#20 @flixos90
17 months ago

Drafted a note for this to be included in the Field Guide's notes section, as shared in https://wordpress.slack.com/archives/C02KGN5K076/p1677531978930639?thread_ts=1677512400.171959&cid=C02KGN5K076.

Note: See TracTickets for help on using tickets.