-
Notifications
You must be signed in to change notification settings - Fork 1
URLInfo
This class is meant to replace PHP's native parse_url
function. It still uses this function internally, but fixes a number of issues that are hard to catch, and offers an object oriented way to access the information.
- Unicode bug: depending on the selected locale, unicode characters in an URL are replaced by underscores. The class circumvents the issue by temporarily selecting a locale that supports unicode.
- Dots in parameter names: the class uses a custom implementation to parse the URL's query string, to circumvent the limitation of not being able to use dots in parameter names.
- Sanitizing: common user input mistakes are corrected, like removing control characters (from copy & pasting), cleaning whitespace and co.
To parse an URL, you can use the global function parseURL
.
// import the function to be able to call it without the prefix
use function AppUtils\parseURL;
// create an instance of the URLInfo class
$info = parseURL('http://www.foo.com');
You can then use this as an array, as a handy drop-in replacement of the original parse_url
array, with the added benefit of not having to check if the target key exists.
$host = $info['host'];
NOTE: The port number will have a value of
-1
if no port has been specified in the URL.
The URLInfo methods open up a lot more possibilities beyond accessing it as an array.
Beyond the getters for all URL parts (for ex. getHost
, getPath
...), some extended functionality includes the following:
- getNormalized: Get a clean URL with parameters ordered alphabetically
- getHash: Get a unique hash for the URL
- getHighlighted: Get a syntax highlighted version of the URL
- getParams: Get an associative array of query parameters in the URL
The getNormalized()
method returns a sanitized version of the URL. Normalizing does the following things:
- Strip out white space where there should be none, including newlines.
- Correctly URL encode parameters as needed.
- Order query parameters alphabetically.
This guarantees that two URLs that have the same query parameters, but in a different order, can be easily recognized as being the same, like this:
use function AppUtils\parseURL;
$urlA = parseURL('https://mistralys.eu?foo=bar&bar=foo');
$urlB = parseURL('https://mistralys.eu?bar=foo&foo=bar');
if($urlA->getNormalized() === $urlB->getNormalized()) {
echo 'They are the same.';
}
For common URL schemes, some utility methods are available beyond checking the scheme
key:
-
isPhone()
- Phone number linktel:+12345678
-
isAnchor()
- Fragment link#jump
-
isEmail()
- Mailto linkmailto:name@address.domain
(with or withoutmailto:
scheme) -
isURL()
- Regularhttps
orhttp
link
It is possible to add query parameter names that should be ignored when normalizing the URL: they will simply be left out.
use function AppUtils\parseURL;
echo parseURL('https://mistralys.eu?foo=bar&bar=foo')
->excludeParam('bar')
->getNormalized();
Output:
https://mistralys.eu?foo=bar
This feature is taken into account when highlighting URLs: excluded parameters can either be left out (default behavior), or included but specially highlighted as ignored. An optional explanation text can be added, which will be used as tooltip for the parameter.
use function AppUtils\parseURL;
echo parseURL('https://mistralys.eu?foo=bar&bar=foo')
->excludeParam('bar', 'This parameter is not needed anymore.');
->setHighlightExcluded(true) // enable highlighting excluded params
->getHighlighted();
Query parameters can easily be added, removed or overwritten:
use function AppUtils\parseURL;
echo parseURL('https://mistralys.eu?foo=bar&message=Hello')
->removeParam('foo')
->setParam('bar', 'foo') // Add the param
->setParam('message', 'Bye') // Overwrite the value
->getNormalized();
Output:
https://mistralys.eu?bar=foo&message=Bye
When working with URLs that contain single host names, they cannot be reliably recognized. To allow such hosts to be recognized, they must be added to the known hosts list:
use function AppUtils\parseURL;
use AppUtils\URLInfo\URLHosts;
parseURL('hostname')->isValid(); // FALSE
URLHosts:addHost('hostname');
parseURL('hostname')->isValid(); // TRUE
NOTE: The special host name
localhost
is available natively.
A number of URI schemes are included by default (like http
, https
, mailto
, etc.),
but custom or rarely used schemes will be considered invalid. Missing schemes can easily
be added in the global list:
use AppUtils\URLInfo\URISchemes;
use function AppUtils\parseURL;
parseURL('custom://mistralys.eu')->isValid() // FALSE;
URISchemes::addScheme('custom://');
parseURL('custom://mistralys.eu')->isValid() // TRUE;
To account for different scheme notations, they must be specified with the colon and slashes, if applicable.
mailto:
andtel:
for example, do not need the slashes.
New here?
Have a look at the overview for a list of all helper classes available in the package.
Table of contents
Find the current page in the collapsible "Pages" list above, and expand the page, to view a table of contents.