-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WARC-IP-Address header to WARCWriterChainProcessor #396
Comments
I don't think the IP is handy - A_DNS_SERVER_IP_LABEL is only populated for DNS requests. This explains why the original code looks up the address in the cache. We could probably do what that TODO:FIXME comment suggests though and have RecordingHttpClientConnection stash the exact ip in the Recorder since it has access to the underlying Socket object when it wraps the input and output streams for recording. |
Oh except Recorder is in webarchive-commons. Gah. |
Looks like DNS records are also being written with ip/port label like
|
It also turns out |
The original
WARCWriterProcessor
added aWARC-IP-Address
header if the IP of the host is known:heritrix3/modules/src/main/java/org/archive/modules/writer/WARCWriterProcessor.java
Line 278 in 37ce8d6
The newer
WARCWriterChainProcessor
uses a builder for HTTP responses, but this does not include the IP address:heritrix3/modules/src/main/java/org/archive/modules/warc/HttpResponseRecordBuilder.java
Line 35 in 37ce8d6
Other builders do, like this:
heritrix3/modules/src/main/java/org/archive/modules/warc/DnsResponseRecordBuilder.java
Lines 42 to 45 in 37ce8d6
It's not clear the original complexity is needed:
heritrix3/modules/src/main/java/org/archive/modules/writer/WriterPoolProcessor.java
Lines 378 to 401 in 37ce8d6
But if the IP is handy, it should be included in the HTTP response.
The text was updated successfully, but these errors were encountered: