Harvest statistics and meta data from an URL or his source code (seo oriented).
Implemented in Seo Pocket Crawler (source on github).
Via Packagist
$ composer require piedweb/url-harvester
Harvest Methods :
use \PiedWeb\UrlHarvester\Harvest;
use \PiedWeb\UrlHarvester\Link;
$url = 'https://piedweb.com';
Harvest::fromUrl($url)
->getResponse()->getInfo('total_time') // load time
->getResponse()->getInfo('size_download')
->getResponse()->getStatusCode()
->getResponse()->getContentType()
->getRes...
->getTag('h1') // @return first tag content (could be html)
->getUniqueTag('h1') // @return first tag content in utf8 (could contain html)
->getMeta('description') // @return string from content attribute or NULL
->getCanonical() // @return string|NULL
->isCanonicalCorrect() // @return bool
->getRatioTxtCode() // @return int
->getTextAnalysis() // @return \PiedWeb\TextAnalyzer\Analysis
->getKws() // @return 10 more used words
->getBreadCrumb()
->indexable($userAgent = 'googlebot') // @return int corresponding to a const from Indexable
->getLinks()
->getLinks(Link::LINK_SELF)
->getLinks(Link::LINK_INTERNAL)
->getLinks(Link::LINK_SUB)
->getLinks(Link::LINK_EXTERNAL)
->getLinkedRessources() // Return an array with all attributes containing a href or a src property
->mayFollow() // check headers and meta and return bool
->getDomain()
->getBaseUrl()
->getRobotsTxt() // @return \Spatie\Robots\RobotsTxt or empty string
->setRobotsTxt($content) // @param string or RobotsTxt
$ composer test
Please see contributing
The MIT License (MIT). Please see License File for more information.