Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid fetching image dimensions when for internal image. #6448

Merged
merged 19 commits into from
Aug 5, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
371e02b
Add image dimension extractor by filename and from file system
dhaval-parekh Jul 6, 2021
ae99bf4
Add fallback for image dimension if attachment metadata is not available
dhaval-parekh Jul 7, 2021
ce58f9c
Add caching which fetching attachment ID by attached path
dhaval-parekh Jul 7, 2021
4c5e520
Add unit test cases
dhaval-parekh Jul 7, 2021
cf41aee
Fix unit test cases
dhaval-parekh Jul 7, 2021
a8bf41d
Add check for function exist for wp_getimagesize()
dhaval-parekh Jul 7, 2021
0bb00d6
Apply suggestions from PR and add transient cache for image dimention
dhaval-parekh Jul 22, 2021
4474983
Update unit test cases
dhaval-parekh Jul 22, 2021
cdaa764
Apply suggestion from PR
dhaval-parekh Jul 29, 2021
e6af9f2
update unit test cases
dhaval-parekh Jul 29, 2021
45689e5
Harden extract_by_filename_or_filesystem logic
westonruter Aug 3, 2021
73e68de
Merge branch 'develop' of github.com:ampproject/amp-wp into add/5115-…
westonruter Aug 3, 2021
b1267a8
Restore wp_using_ext_object_cache() in tearDown(); use dataProvider
westonruter Aug 3, 2021
2fffaae
Privatize helper method for getting transient names and return positi…
westonruter Aug 3, 2021
523be81
Add testing for when dimensions get stored in transients
westonruter Aug 3, 2021
38355ea
Add sanity checking for image filename extensions
westonruter Aug 4, 2021
35310f1
Add tests for fragment identifiers on URLs
westonruter Aug 4, 2021
682d516
Use `strtolower()` and add strict param for `in_array()`
westonruter Aug 4, 2021
bec8f93
Increase filter priority of extract_by_filename_or_filesystem to 100
westonruter Aug 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 124 additions & 6 deletions includes/utils/class-amp-image-dimension-extractor.php
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ public static function normalize_url( $url ) {
private static function register_callbacks() {
self::$callbacks_registered = true;

add_filter( 'amp_extract_image_dimensions_batch', [ __CLASS__, 'extract_by_filename_or_filesystem' ], 100 );
add_filter( 'amp_extract_image_dimensions_batch', [ __CLASS__, 'extract_by_downloading_images' ], 999, 1 );

/**
Expand All @@ -150,6 +151,125 @@ private static function register_callbacks() {
do_action( 'amp_extract_image_dimensions_batch_callbacks_registered' );
}

/**
* Extract dimensions from filename if dimension exists or from file system.
*
* @param array $dimensions Image urls mapped to dimensions.
* @return array Dimensions mapped to image urls, or false if they could not be retrieved
*/
public static function extract_by_filename_or_filesystem( $dimensions ) {

if ( empty( $dimensions ) || ! is_array( $dimensions ) ) {
return [];
}

$using_ext_object_cache = wp_using_ext_object_cache();
$ext_types = wp_get_ext_types();
if ( empty( $ext_types['image'] ) ) {
return $dimensions;
}
$image_ext_types = $ext_types['image'];
unset( $ext_types );

$upload_dir = wp_get_upload_dir();
$base_upload_uri = trailingslashit( $upload_dir['baseurl'] );
$base_upload_dir = trailingslashit( $upload_dir['basedir'] );

foreach ( $dimensions as $url => $value ) {

// Check whether some other callback attached to the filter already provided dimensions for this image.
if ( ! empty( $value ) && is_array( $value ) ) {
continue;
}

$url_without_query_fragment = strtok( $url, '?#' );

// Parse info out of the URL, including the file extension and (optionally) the dimensions.
if ( ! preg_match( '/(?:-(?<width>[1-9][0-9]*)x(?<height>[1-9][0-9]*))?\.(?<ext>\w+)$/', $url_without_query_fragment, $matches ) ) {
continue;
}

// Skip images don't have recognized extensions.
if ( ! in_array( strtolower( $matches['ext'] ), $image_ext_types, true ) ) {
continue;
}

// Use image dimension from the file name.
if ( ! empty( $matches['width'] ) && ! empty( $matches['height'] ) ) {
$dimensions[ $url ] = [
'width' => (int) $matches['width'],
'height' => (int) $matches['height'],
];
continue;
}

// Verify that the URL is for an uploaded file.
if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
continue;
}
$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );

// Bail if the URL contains relative paths.
if ( 0 !== validate_file( $upload_relative_path ) ) {
continue;
}
Comment on lines +206 to +215
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be the first set of checks before parsing the URL for dimensions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure because the function expects a file path as input, not a URL. So I think we should make sure it looks like a file path, or else this could break if validate_file() changes its behavior in the future which isn't compatible with URLs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant moving both checks. So it would look like:

diff --git a/includes/utils/class-amp-image-dimension-extractor.php b/includes/utils/class-amp-image-dimension-extractor.php
--- a/includes/utils/class-amp-image-dimension-extractor.php	(revision 35310f1ff829be5a145d14e72af27a28b2cfc0c6)
+++ b/includes/utils/class-amp-image-dimension-extractor.php	(date 1628092492414)
@@ -184,6 +184,17 @@
 
 			$url_without_query_fragment = strtok( $url, '?#' );
 
+			// Verify that the URL is for an uploaded file.
+			if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
+				continue;
+			}
+			$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );
+
+			// Bail if the URL contains relative paths.
+			if ( 0 !== validate_file( $upload_relative_path ) ) {
+				continue;
+			}
+
 			// Parse info out of the URL, including the file extension and (optionally) the dimensions.
 			if ( ! preg_match( '/(?:-(?<width>[1-9][0-9]*)x(?<height>[1-9][0-9]*))?\.(?<ext>\w+)$/', $url_without_query_fragment, $matches ) ) {
 				continue;
@@ -202,17 +213,6 @@
 				];
 				continue;
 			}
-
-			// Verify that the URL is for an uploaded file.
-			if ( 0 !== strpos( $url_without_query_fragment, $base_upload_uri ) ) {
-				continue;
-			}
-			$upload_relative_path = substr( $url_without_query_fragment, strlen( $base_upload_uri ) );
-
-			// Bail if the URL contains relative paths.
-			if ( 0 !== validate_file( $upload_relative_path ) ) {
-				continue;
-			}
 
 			// Get image dimension from file system.
 			$image_file = $base_upload_dir . $upload_relative_path;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just realized that this needs to be ordered in this way. The reason is that we will use the dimensions in the filename for remote files which have URLs that don't start with $base_upload_uri.

So the order is correct as it stands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, makes sense 👍


// Get image dimension from file system.
$image_file = $base_upload_dir . $upload_relative_path;

$image_size = [];

list( $transient_name ) = self::get_transient_names( $url );

// When using an external object cache, try to first see if dimensions have already been obtained. This is
// not done for a non-external object cache (i.e. when wp_options is used for transients) because then
// we are not storing the dimensions in a transient, because it is more performant to read the dimensions
// from the filesystem--both in terms of time and storage--than to store dimensions in wp_options.
if ( $using_ext_object_cache ) {
$image_size = get_transient( $transient_name );
$image_size = ( ! empty( $image_size ) && is_array( $image_size ) ) ? $image_size : [];
}

if ( ( empty( $image_size ) || ! is_array( $image_size ) ) && file_exists( $image_file ) ) {
if ( function_exists( 'wp_getimagesize' ) ) {
$image_size = wp_getimagesize( $image_file );
} elseif ( function_exists( 'getimagesize' ) ) {
$image_size = @getimagesize( $image_file ); // phpcs:ignore WordPress.PHP.NoSilencedErrors
}

if ( $using_ext_object_cache && ! empty( $image_size ) && is_array( $image_size ) ) {
set_transient( $transient_name, $image_size );
}
}

if ( ! empty( $image_size ) && is_array( $image_size ) ) {
$dimensions[ $url ] = [
'width' => (int) $image_size[0],
'height' => (int) $image_size[1],
];
}
}

return $dimensions;
}

/**
* Get transient names.
*
* @param string $url Image URL.
* @return array {
* @type string $0 Transient name for storing dimensions.
* @type string $1 Transient name for image fetching lock.
* }
*/
private static function get_transient_names( $url ) {
$url_hash = md5( $url );
return [
sprintf( 'amp_img_%s', $url_hash ),
sprintf( 'amp_lock_%s', $url_hash ),
];
}

/**
* Extract dimensions from downloaded images (or transient/cached dimensions from downloaded images)
*
Expand Down Expand Up @@ -190,12 +310,12 @@ private static function determine_which_images_to_fetch( &$dimensions, &$urls_to
foreach ( $dimensions as $url => $value ) {

// Check whether some other callback attached to the filter already provided dimensions for this image.
if ( is_array( $value ) ) {
if ( is_array( $value ) || empty( $url ) ) {
continue;
}

$url_hash = md5( $url );
$transient_name = sprintf( 'amp_img_%s', $url_hash );
list( $transient_name, $transient_lock_name ) = self::get_transient_names( $url );

$cached_dimensions = get_transient( $transient_name );

// If we're able to retrieve the dimensions from a transient, set them and move on.
Expand All @@ -213,8 +333,6 @@ private static function determine_which_images_to_fetch( &$dimensions, &$urls_to
continue;
}

$transient_lock_name = sprintf( 'amp_lock_%s', $url_hash );

// If somebody is already trying to extract dimensions for this transient right now, move on.
if ( false !== get_transient( $transient_lock_name ) ) {
$dimensions[ $url ] = false;
Expand Down Expand Up @@ -243,7 +361,7 @@ private static function fetch_images( $urls_to_fetch, &$images ) {
$client = new \FasterImage\FasterImage();

/**
* Filters the user agent for onbtaining the image dimensions.
* Filters the user agent for obtaining the image dimensions.
*
* @param string $user_agent User agent.
*/
Expand Down
153 changes: 150 additions & 3 deletions tests/php/test-amp-image-dimension-extractor.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,33 @@
* @package AMP
*/

use AmpProject\AmpWP\Tests\Helpers\PrivateAccess;

/**
* Tests for AMP_Image_Dimension_Extractor.
*
* @covers AMP_Image_Dimension_Extractor
*/
class AMP_Image_Dimension_Extractor_Extract_Test extends WP_UnitTestCase {

/**
* Set up.
*/
use PrivateAccess;

/** @var bool */
private $using_ext_object_cache;

public function setUp() {
parent::setUp();

// We don't want to actually download the images; just testing the extract method.
add_action( 'amp_extract_image_dimensions_batch_callbacks_registered', [ $this, 'disable_downloads' ] );

$this->using_ext_object_cache = wp_using_ext_object_cache();
}

public function tearDown() {
parent::tearDown();

wp_using_ext_object_cache( $this->using_ext_object_cache );
}

/**
Expand Down Expand Up @@ -184,4 +196,139 @@ public function test__amp_wp_user_agent() {

$this->assertEquals( $expected, $user_agent );
}

/** @return array */
public function get_data_for_test_extract_by_filename_or_filesystem() {
return [
'without_ext_object_cache' => [ false ],
'with_ext_object_cache' => [ true ],
];
}

/**
* @dataProvider get_data_for_test_extract_by_filename_or_filesystem
*
* @covers \AMP_Image_Dimension_Extractor::extract_by_filename_or_filesystem()
*/
public function test_extract_by_filename_or_filesystem( $using_ext_object_cache ) {
wp_using_ext_object_cache( $using_ext_object_cache );

$attachment_id = $this->factory()->attachment->create_upload_object( __DIR__ . '/data/images/wordpress-logo.png' );

$full_image = wp_get_attachment_image_src( $attachment_id, 'full' );
$this->assertNotRegExp( '/-\d+x\d+\.\w+/', $full_image[0], 'Expected no dimensions in filename.' );
$thumbnail_image = wp_get_attachment_image_src( $attachment_id, 'thumbnail' );
$this->assertRegExp( '/-\d+x\d+\.\w+/', $thumbnail_image[0], 'Expected dimensions in file name. ' );

$external_image_with_dims_in_url = 'https://example.com/wp-content/uploads/2021/04/American_bison_k5680-1-1024x668.jpg';
$external_image_1 = 'https://via.placeholder.com/1500/000.png/FF0';
$external_image_2 = 'https://via.placeholder.com/1000/000.png/FF0';
$image_with_query_string = 'https://example.com/wp-content/uploads/2021/04/American_bison_k5680-1-512x768.jpg?crop=1';
$image_with_fragment = 'https://example.com/wp-content/uploads/2021/04/American_bison_k5680-1-512x768.jpg#interesting';
$image_with_query_string_and_frag = 'https://example.com/wp-content/uploads/2021/04/American_bison_k5680-1-512x768.jpg?crop=1#interesting';
$internal_image_with_query_string = $full_image[0] . '?crop=1&resize=1';
$audio_file = 'https://example.com/music.mp3';

$data = [
$full_image[0] => [
'input' => [],
'expected' => [
'width' => $full_image[1],
'height' => $full_image[2],
],
'stored' => $using_ext_object_cache,
],
$thumbnail_image[0] => [
'input' => [],
'expected' => [
'width' => $thumbnail_image[1],
'height' => $thumbnail_image[2],
],
'stored' => false, // Never stored because dimensions are in the URL.
],
$external_image_with_dims_in_url => [
'input' => [],
'expected' => [
'width' => 1024,
'height' => 668,
],
'stored' => false, // Never stored because dimensions are in the URL.
],
$external_image_1 => [
'input' => [],
'expected' => [], // Nothing since we're only calling extract_by_filename_or_filesystem and not extract_by_downloading_images.
'stored' => false,
],
$external_image_2 => [
'input' => [
'width' => 1000,
'height' => 1000,
],
'expected' => [
'width' => 1000,
'height' => 1000,
],
'stored' => false, // Because dimensions already provided in input.
],
$image_with_query_string => [
'input' => [],
'expected' => [
'width' => 512,
'height' => 768,
],
'stored' => false, // Never stored because dimensions are in the URL.
],
$image_with_fragment => [
'input' => [],
'expected' => [
'width' => 512,
'height' => 768,
],
'stored' => false, // Never stored because dimensions are in the URL.
],
$image_with_query_string_and_frag => [
'input' => [],
'expected' => [
'width' => 512,
'height' => 768,
],
'stored' => false, // Never stored because dimensions are in the URL.
],
$internal_image_with_query_string => [
'input' => [],
'expected' => [
'width' => $full_image[1],
'height' => $full_image[2],
],
'stored' => $using_ext_object_cache,
],
$audio_file => [
'input' => [],
'expected' => [],
'stored' => false,
],
];

$input = wp_list_pluck( $data, 'input' );
$expected = wp_list_pluck( $data, 'expected' );
$stored = wp_list_pluck( $data, 'stored' );

$this->assertEmpty( AMP_Image_Dimension_Extractor::extract_by_filename_or_filesystem( [] ) );

$output = AMP_Image_Dimension_Extractor::extract_by_filename_or_filesystem( $input );
$this->assertEquals( $expected, $output );

foreach ( $stored as $url => $expected_transient ) {
list( $transient_name ) = $this->call_private_static_method(
AMP_Image_Dimension_Extractor::class,
'get_transient_names',
[ $url ]
);
if ( $expected_transient ) {
$this->assertInternalType( 'array', get_transient( $transient_name ), "Expected transient to be stored for $url." );
} else {
$this->assertFalse( get_transient( $transient_name ), "Expected no transient to be stored for $url." );
}
}
}
}