Skip to content

3.7.0

Latest
Compare
Choose a tag to compare
@ato ato released this 03 Feb 05:26
· 7 commits to master since this release

Download distribution zip (or tar.gz)

Full Changelog | Javadoc | Maven Central

New Features

  • Groovy crawl configs (experimental): Groovy Bean Definition DSL can now be used as an experimental alternative to Spring XML. This enables more terse and human-readable job configuration with inline scripting capabilities. There is no user interface for it in this release. For now, you must manually create a crawler-beans.groovy file in your job directory. #632

  • ExtractorHTML obeyRelNofollow: This option skips extraction of links marked rel=nofollow. This is useful for avoiding crawler traps on some sites. #638

Fixes

  • Cookie rejected warning: The slf4j change in 3.6.0 inadvertently caused a previously hidden warning to be logged to job.log when a server sends a Set-Cookie header with a disallowed domain value. This warning is now suppressed since it occurs frequently and does not require any action from the crawl operator. #640

Changes

  • Removed fastutil: A small number of usages of fastutil were replaced with standard library equivalents in webarchive-commons and Heritrix. This reduced the Heritrix distribution size from 51 MB to 34 MB. iipc/webarchive-commons#101

Dependency Upgrades

  • amqp-client 5.24.0
  • commons-codec 1.17.2
  • ftpserver-core 1.2.1
  • freemarker 2.3.34
  • jetty 9.4.57.v20241219
  • jsch 0.2.22
  • restlet 2.5.0
  • spring 6.1.16
  • webarchive-commons 1.3.0