Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid fetching image dimensions when for internal image. #6448

Merged
merged 19 commits into from
Aug 5, 2021

Conversation

dhaval-parekh
Copy link
Collaborator

Summary

Fixes #5115

Checklist

  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@dhaval-parekh dhaval-parekh self-assigned this Jul 21, 2021
@dhaval-parekh dhaval-parekh force-pushed the add/5115-image-dimension branch from b00cbe7 to f4e3898 Compare July 22, 2021 10:10
@dhaval-parekh dhaval-parekh marked this pull request as ready for review July 22, 2021 10:23
@github-actions
Copy link
Contributor

github-actions bot commented Jul 22, 2021

Plugin builds for bec8f93 are ready 🛎️!

Comment on lines 177 to 186
// Get image dimension from file name.
$basename = basename( $url );
$filename_without_extension = explode( '.', $basename );
$extension = array_pop( $filename_without_extension );
$filename_without_extension = implode( '.', $filename_without_extension );

$regex = '/-(?<width>\d+)x(?<height>\d+)(?:\.' . $extension . ')$/m';
preg_match( $regex, $basename, $possible_dimension );

if ( ! empty( $possible_dimension['width'] ) && ! empty( $possible_dimension['height'] ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified:

Suggested change
// Get image dimension from file name.
$basename = basename( $url );
$filename_without_extension = explode( '.', $basename );
$extension = array_pop( $filename_without_extension );
$filename_without_extension = implode( '.', $filename_without_extension );
$regex = '/-(?<width>\d+)x(?<height>\d+)(?:\.' . $extension . ')$/m';
preg_match( $regex, $basename, $possible_dimension );
if ( ! empty( $possible_dimension['width'] ) && ! empty( $possible_dimension['height'] ) ) {
// Get image dimension from file name.
$path = wp_parse_url( $url, PHP_URL_PATH );
if ( empty( $path ) ) {
continue;
}
if ( preg_match( '/-(?<width>\d+)x(?<height>\d+)\.\w+$/', $path, $possible_dimension ) ) {

Comment on lines 203 to 204
$url_hash = md5( $url );
$transient_name = sprintf( 'amp_img_%s', $url_hash );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for obtaining the cached dimensions transient is now duplicated in this class. There should be a helper method get_cached_dimensions_transient_name that contains the duplicated logic.

}

// Get media path.
$attached_path = ltrim( str_replace( $base_upload_uri, '', $url ), '/' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $url could have query parameters here. So those should get stripped off.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, instead of doing str_replace() I suggest using substr( $url, strlen( $base_upload_uri ) ) since we know we want to strip off a certain number of characters from the beginning as discovered above.

}

// Check if it's internal media or not. If it's not then bail out.
if ( 0 !== strpos( trim( $url ), $base_upload_uri ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ( 0 !== strpos( trim( $url ), $base_upload_uri ) ) {
if ( 0 !== strpos( $url, $base_upload_uri ) ) {

if ( function_exists( 'wp_getimagesize' ) ) {
$imagesize = wp_getimagesize( $image_file );
} else {
$imagesize = getimagesize( $image_file );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$imagesize = getimagesize( $image_file );
$imagesize = @getimagesize( $image_file ); // phpcs:ignore WordPress.PHP.NoSilencedErrors

@@ -184,4 +184,61 @@ public function test__amp_wp_user_agent() {

$this->assertEquals( $expected, $user_agent );
}

/**
* @covers \AMP_Image_Dimension_Extractor::extract_by_filename_or_filesystem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @covers \AMP_Image_Dimension_Extractor::extract_by_filename_or_filesystem
* @covers \AMP_Image_Dimension_Extractor::extract_by_filename_or_filesystem()

Comment on lines 237 to 239
// Check after removing attachment metadata, To check fallback to filesystem.

wp_update_attachment_metadata( $attachment_id, [] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check after removing attachment metadata, To check fallback to filesystem.
wp_update_attachment_metadata( $attachment_id, [] );
// Check after removing attachment metadata, to check fallback to filesystem.
wp_update_attachment_metadata( $attachment_id, [] );

@dhaval-parekh dhaval-parekh force-pushed the add/5115-image-dimension branch from f4e3898 to cdaa764 Compare July 29, 2021 11:07
* Ensure file exists before we get the size info.
* Prevent 0 from being a dimension in a file name.
* Explain why we check for external object cache.
…image-dimension

* 'develop' of github.com:ampproject/amp-wp:
  Allow the amp-carousel script to be on the page when there is just amp-lightbox-gallery (#6509)
  Mock HTTP request
  Supply non-empty body and headers for tests
  Only unserialize cached response if it is a string
  Update amp-twitter to use fixed-height layout with a default height matching minimal tweet height
  Bump eslint from 7.31.0 to 7.32.0
  Use OVERFLOW constant
  Remove unused `use` statements
  Add overflow buttons to Twitter, Facebook, and Instagram embeds
  Pin amp-toolbox-php to specific commit in main
  Update amp-toolbox-php to 98b020d6
  Remove unused AmpProject\Dom\Document imports
  Update references to renamed class constants and remove old deprecated code
  Update amp-toolbox-php to 0.7.0-alpha (main@a155c37)
  Update snapshot
  Use unique IDs for each SVG component instantiation
  Avoid redirect_extraneous_paired_endpoint from needlessly redirecting
  Revert "Prevent mutating URL when query var is absent"
  Prevent mutating URL when query var is absent
Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes check out to me. Good work.

I want to get a second review from @pierlon as well since this touches on some fundamental aspects of how the plugin has worked for a long time.

@westonruter westonruter requested a review from pierlon August 4, 2021 00:27
@westonruter westonruter added this to the v2.2 milestone Aug 4, 2021
@westonruter
Copy link
Member

Note this will specifically reduce data being stored in options as is a valid complaint about the plugin: https://wordpress.org/support/topic/this-plugin-inserted-a-lot-of-data-in-the-wp_options-table-2/

Comment on lines +206 to +215
// Verify that the URL is for an uploaded file.
if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
continue;
}
$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );

// Bail if the URL contains relative paths.
if ( 0 !== validate_file( $upload_relative_path ) ) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be the first set of checks before parsing the URL for dimensions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure because the function expects a file path as input, not a URL. So I think we should make sure it looks like a file path, or else this could break if validate_file() changes its behavior in the future which isn't compatible with URLs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant moving both checks. So it would look like:

diff --git a/includes/utils/class-amp-image-dimension-extractor.php b/includes/utils/class-amp-image-dimension-extractor.php
--- a/includes/utils/class-amp-image-dimension-extractor.php	(revision 35310f1ff829be5a145d14e72af27a28b2cfc0c6)
+++ b/includes/utils/class-amp-image-dimension-extractor.php	(date 1628092492414)
@@ -184,6 +184,17 @@
 
 			$url_without_query_fragment = strtok( $url, '?#' );
 
+			// Verify that the URL is for an uploaded file.
+			if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
+				continue;
+			}
+			$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );
+
+			// Bail if the URL contains relative paths.
+			if ( 0 !== validate_file( $upload_relative_path ) ) {
+				continue;
+			}
+
 			// Parse info out of the URL, including the file extension and (optionally) the dimensions.
 			if ( ! preg_match( '/(?:-(?<width>[1-9][0-9]*)x(?<height>[1-9][0-9]*))?\.(?<ext>\w+)$/', $url_without_query_fragment, $matches ) ) {
 				continue;
@@ -202,17 +213,6 @@
 				];
 				continue;
 			}
-
-			// Verify that the URL is for an uploaded file.
-			if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
-				continue;
-			}
-			$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );
-
-			// Bail if the URL contains relative paths.
-			if ( 0 !== validate_file( $upload_relative_path ) ) {
-				continue;
-			}
 
 			// Get image dimension from file system.
 			$image_file = $base_upload_dir . $upload_relative_path;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just realized that this needs to be ordered in this way. The reason is that we will use the dimensions in the filename for remote files which have URLs that don't start with $base_upload_uri.

So the order is correct as it stands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, makes sense 👍

Comment on lines +206 to +215
// Verify that the URL is for an uploaded file.
if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
continue;
}
$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );

// Bail if the URL contains relative paths.
if ( 0 !== validate_file( $upload_relative_path ) ) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant moving both checks. So it would look like:

diff --git a/includes/utils/class-amp-image-dimension-extractor.php b/includes/utils/class-amp-image-dimension-extractor.php
--- a/includes/utils/class-amp-image-dimension-extractor.php	(revision 35310f1ff829be5a145d14e72af27a28b2cfc0c6)
+++ b/includes/utils/class-amp-image-dimension-extractor.php	(date 1628092492414)
@@ -184,6 +184,17 @@
 
 			$url_without_query_fragment = strtok( $url, '?#' );
 
+			// Verify that the URL is for an uploaded file.
+			if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
+				continue;
+			}
+			$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );
+
+			// Bail if the URL contains relative paths.
+			if ( 0 !== validate_file( $upload_relative_path ) ) {
+				continue;
+			}
+
 			// Parse info out of the URL, including the file extension and (optionally) the dimensions.
 			if ( ! preg_match( '/(?:-(?<width>[1-9][0-9]*)x(?<height>[1-9][0-9]*))?\.(?<ext>\w+)$/', $url_without_query_fragment, $matches ) ) {
 				continue;
@@ -202,17 +213,6 @@
 				];
 				continue;
 			}
-
-			// Verify that the URL is for an uploaded file.
-			if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
-				continue;
-			}
-			$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );
-
-			// Bail if the URL contains relative paths.
-			if ( 0 !== validate_file( $upload_relative_path ) ) {
-				continue;
-			}
 
 			// Get image dimension from file system.
 			$image_file = $base_upload_dir . $upload_relative_path;

@westonruter westonruter merged commit 851d64b into develop Aug 5, 2021
@westonruter westonruter deleted the add/5115-image-dimension branch August 5, 2021 02:42
@westonruter westonruter added the Changelogged Whether the issue/PR has been added to release notes. label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelogged Whether the issue/PR has been added to release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid fetching image dimensions over HTTP when possible
3 participants