Skip to content

URLInfo

Mistralys edited this page Jul 22, 2022 · 15 revisions

This class is meant to replace PHP's native parse_url function. It still uses this function internally, but fixes a number of issues that are hard to catch, and offers an object oriented way to access the information.

Fixes & enhancements of PHP's parse_url behavior

  • Unicode bug: depending on the selected locale, unicode characters in an URL are replaced by underscores. The class circumvents the issue by temporarily selecting a locale that supports unicode.
  • Dots in parameter names: the class uses a custom implementation to parse the URL's query string, to circumvent the limitation of not being able to use dots in parameter names.
  • Sanitizing: common user input mistakes are corrected, like removing control characters (from copy & pasting), cleaning whitespace and co.

Basic usage

To parse an URL, you can use the global function parseURL.

// import the function to be able to call it without the prefix
use function AppUtils\parseURL;

// create an instance of the URLInfo class
$info = parseURL('http://www.foo.com');

You can then use this as an array, as a handy drop-in replacement of the original parse_url array, with the added benefit of not having to check if the target key exists.

$host = $info['host'];

NOTE: The port number will have a value of -1 if no port has been specified in the URL.

Utility methods

The URLInfo methods open up a lot more possibilities beyond accessing it as an array.

Getter methods

Beyond the getters for all URL parts (for ex. getHost, getPath...), some extended functionality includes the following:

  • getNormalized: Get a clean URL with parameters ordered alphabetically
  • getHash: Get a unique hash for the URL
  • getHighlighted: Get a syntax highlighted version of the URL
  • getParams: Get an associative array of query parameters in the URL

Normalizing URLs

The getNormalized() method returns a sanitized version of the URL. Normalizing does the following things:

  • Strip out white space where there should be none, including newlines.
  • Correctly URL encode parameters as needed.
  • Order query parameters alphabetically.

This guarantees that two URLs that have the same query parameters, but in a different order, can be easily recognized as being the same, like this:

use function AppUtils\parseURL;

$urlA = parseURL('https://mistralys.eu?foo=bar&bar=foo');
$urlB = parseURL('https://mistralys.eu?bar=foo&foo=bar');

if($urlA->getNormalized() === $urlB->getNormalized()) {
  echo 'They are the same.';
}

Detecting the type

For common URL schemes, some utility methods are available beyond checking the scheme key:

  • isPhone() - Phone number link tel:+12345678
  • isAnchor() - Fragment link #jump
  • isEmail() - Mailto link mailto:name@address.domain (with or without mailto: scheme)
  • isURL() - Regular https or http link

Parameter exclusion

It is possible to add query parameter names that should be ignored when normalizing the URL: they will simply be left out.

use function AppUtils\parseURL;

echo parseURL('https://mistralys.eu?foo=bar&bar=foo')
  ->excludeParam('bar')
  ->getNormalized();

Output:

https://mistralys.eu?foo=bar

This feature is taken into account when highlighting URLs: excluded parameters can either be left out (default behavior), or included but specially highlighted as ignored. An optional explanation text can be added, which will be used as tooltip for the parameter.

use function AppUtils\parseURL;

echo parseURL('https://mistralys.eu?foo=bar&bar=foo')
  ->excludeParam('bar', 'This parameter is not needed anymore.');
  ->setHighlightExcluded(true) // enable highlighting excluded params
  ->getHighlighted();

Adding and removing parameters

Query parameters can easily be added, removed or overwritten:

use function AppUtils\parseURL;

echo parseURL('https://mistralys.eu?foo=bar&message=Hello')
  ->removeParam('foo')
  ->setParam('bar', 'foo') // Add the param
  ->setParam('message', 'Bye') // Overwrite the value
  ->getNormalized();

Output:

https://mistralys.eu?bar=foo&message=Bye

Adding hosts to recognize

When working with URLs that contain single host names, they cannot be reliably recognized. To allow such hosts to be recognized, they must be added to the known hosts list:

use function AppUtils\parseURL;
use AppUtils\URLInfo\URLHosts;

parseURL('hostname')->isValid(); // FALSE

URLHosts:addHost('hostname');

parseURL('hostname')->isValid(); // TRUE

NOTE: The special host name localhost is available natively.

Adding URI schemes to recognize

A number of URI schemes are included by default (like http, https, mailto, etc.), but custom or rarely used schemes will be considered invalid. Missing schemes can easily be added in the global list:

use AppUtils\URLInfo\URISchemes;
use function AppUtils\parseURL;

parseURL('custom://mistralys.eu')->isValid() // FALSE;

URISchemes::addScheme('custom://');

parseURL('custom://mistralys.eu')->isValid() // TRUE;

To account for different scheme notations, they must be specified with the colon and slashes, if applicable. mailto: and tel: for example, do not need the slashes.

New here?

Have a look at the overview for a list of all helper classes available in the package.

Table of contents

Find the current page in the collapsible "Pages" list above, and expand the page, to view a table of contents.

Clone this wiki locally