Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support http range header #10

Merged
merged 42 commits into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
aa705a7
chore: limit body parameters to the types used
SgtPooki Mar 4, 2024
089ae24
chore: add response-header helper and tests
SgtPooki Mar 4, 2024
5af9252
feat: add range header parsing support
SgtPooki Mar 4, 2024
3b2e379
feat: verified-fetch supports range-requests
SgtPooki Mar 4, 2024
d805a51
test: fix dns test asserting test failure since we are catching it now
SgtPooki Mar 4, 2024
4d8e57d
fix: return 500 error when streaming unixfs content throws
SgtPooki Mar 4, 2024
aa25f0c
fix: cleanup code and unexecuting tests hiding errors
SgtPooki Mar 5, 2024
60b56c9
chore: some cleanup and code coverage
SgtPooki Mar 5, 2024
6da36fd
tmp: most things working
SgtPooki Mar 5, 2024
cac2b79
fix: stream slicing and test correctness
SgtPooki Mar 5, 2024
72618bc
chore: fixed some ByteRangeContext tests
SgtPooki Mar 6, 2024
698ee8f
test: add back header helpers
SgtPooki Mar 7, 2024
e413fa5
fix: unixfs tests are passing
SgtPooki Mar 7, 2024
96c7f00
fix: range-requests on raw content
SgtPooki Mar 7, 2024
deb2f2b
feat: tests are passing
SgtPooki Mar 7, 2024
f357a3d
chore: log string casing
SgtPooki Mar 7, 2024
83e80d8
chore: use 502 response instead of 500
SgtPooki Mar 7, 2024
121747b
chore: use libp2p/interface for types in src
SgtPooki Mar 7, 2024
05a6dfb
chore: failing to create range resp logs error
SgtPooki Mar 7, 2024
9dcd798
chore: Apply suggestions from code review
SgtPooki Mar 7, 2024
f296f0b
chore: fix broken tests from github PR patches (my own)
SgtPooki Mar 7, 2024
912ee47
chore: re-enable stream tests for ByteRangeContext
SgtPooki Mar 7, 2024
b0b6a4a
chore: clean up getBody a bit
SgtPooki Mar 8, 2024
f399bed
chore: ByteRangeContext getBody cleanup
SgtPooki Mar 8, 2024
607e5be
Merge branch 'main' into 9-heliaverified-fetch-http-range-request-sup…
SgtPooki Mar 8, 2024
eb0224b
chore: apply suggestions from code review
SgtPooki Mar 15, 2024
d1e6a82
fix: getSlicedBody uses correct types
SgtPooki Mar 15, 2024
07ab941
chore: remove extra stat call
SgtPooki Mar 15, 2024
ac621a2
chore: fix jsdoc with '*/'
SgtPooki Mar 15, 2024
46dc133
chore: fileSize is public property, but should not be used
SgtPooki Mar 15, 2024
36f6c96
test: fix blob comparisons that broke or were never worjing properly
SgtPooki Mar 15, 2024
acdd632
Merge branch 'main' into 9-heliaverified-fetch-http-range-request-sup…
SgtPooki Mar 15, 2024
b48c672
Merge branch 'main' into 9-heliaverified-fetch-http-range-request-sup…
SgtPooki Mar 15, 2024
5fc7ceb
chore: Update byte-range-context.ts
SgtPooki Mar 15, 2024
19c2713
chore: jsdoc cleanup
SgtPooki Mar 15, 2024
a1686a3
Revert "chore: fileSize is public property, but should not be used"
SgtPooki Mar 15, 2024
e7e3fd0
chore: jsdoc comments explaining .fileSize use
SgtPooki Mar 15, 2024
c184e2a
chore: isRangeRequest is public
SgtPooki Mar 15, 2024
d633456
chore: getters/setters update
SgtPooki Mar 15, 2024
314adca
chore: remove unnecessary _contentRangeHeaderValue
SgtPooki Mar 15, 2024
8837738
chore: ByteRangeContext uses setFileSize and getFileSize
SgtPooki Mar 15, 2024
3963006
chore: remove .stat changes that are no longer needed
SgtPooki Mar 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions packages/verified-fetch/src/types.ts
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
export type RequestFormatShorthand = 'raw' | 'car' | 'tar' | 'ipns-record' | 'dag-json' | 'dag-cbor' | 'json' | 'cbor'

export type SupportedBodyTypes = string | ArrayBuffer | Blob | ReadableStream<Uint8Array> | null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picking nits but Types is redundant in a type, Supported is a business-logic kind of decision not a type, so just Body?

Suggested change
export type SupportedBodyTypes = string | ArrayBuffer | Blob | ReadableStream<Uint8Array> | null
export type Body = string | ArrayBuffer | Blob | ReadableStream<Uint8Array> | null

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather do something like ResponseBody, but i'm good with any.

Body could easily be a type that comes from builtin/global types that could cause confusion. SupportedBody would be better I guess.. but it's explicitly informing devs reading the code that it's not just typical Response.body types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResponseBody would be fine. It's a minor point tbh.

291 changes: 291 additions & 0 deletions packages/verified-fetch/src/utils/byte-range-context.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,291 @@
import { calculateByteRangeIndexes, getHeader } from './request-headers.js'
import { getContentRangeHeader } from './response-headers.js'
import type { SupportedBodyTypes } from '../types.js'
import type { ComponentLogger, Logger } from '@libp2p/interface'

type SliceableBody = Exclude<SupportedBodyTypes, ReadableStream<Uint8Array> | null>

/**
* Gets the body size of a given body if it's possible to calculate it synchronously.
*/
function getBodySizeSync (body: SupportedBodyTypes): number | null {
if (typeof body === 'string') {
return body.length
}
if (body instanceof ArrayBuffer || body instanceof Uint8Array) {
return body.byteLength
}
if (body instanceof Blob) {
return body.size
}

if (body instanceof ReadableStream) {
return null
}

return null
}

function getByteRangeFromHeader (rangeHeader: string): { start: string, end: string } {
/**
* Range: bytes=<start>-<end> | bytes=<start2>- | bytes=-<end2>
*/
const match = rangeHeader.match(/^bytes=(?<start>\d+)?-(?<end>\d+)?$/)
if (match?.groups == null) {
throw new Error('Invalid range request')
}

const { start, end } = match.groups

return { start, end }
}

export class ByteRangeContext {
private readonly _isRangeRequest: boolean
private _fileSize: number | null | undefined
private readonly _contentRangeHeaderValue: string | undefined
private _body: SupportedBodyTypes | null = null
SgtPooki marked this conversation as resolved.
Show resolved Hide resolved
private readonly _rangeRequestHeader: string | undefined
SgtPooki marked this conversation as resolved.
Show resolved Hide resolved
private readonly log: Logger
private _isValidRangeRequest: boolean | null = null
private readonly requestRangeStart: number | null
private readonly requestRangeEnd: number | null
private byteStart: number | undefined
private byteEnd: number | undefined
private byteSize: number | undefined

constructor (logger: ComponentLogger, private readonly headers?: HeadersInit) {
this.log = logger.forComponent('helia:verified-fetch:byte-range-context')
this._rangeRequestHeader = getHeader(this.headers, 'Range')
if (this._rangeRequestHeader != null) {
this.log.trace('range request detected')
this._isRangeRequest = true
SgtPooki marked this conversation as resolved.
Show resolved Hide resolved
try {
const { start, end } = getByteRangeFromHeader(this._rangeRequestHeader)
this.requestRangeStart = start != null ? parseInt(start) : null
this.requestRangeEnd = end != null ? parseInt(end) : null
} catch (e) {
this.log.error('error parsing range request header: %o', e)
this.isValidRangeRequest = false
this.requestRangeStart = null
this.requestRangeEnd = null
}

this.setOffsetDetails()
} else {
this.log.trace('no range request detected')
this._isRangeRequest = false
this.requestRangeStart = null
this.requestRangeEnd = null
}
}

public setBody (body: SupportedBodyTypes): void {
this._body = body
// if fileSize was already set, don't recalculate it
this.fileSize = this.fileSize ?? getBodySizeSync(body)

this.log.trace('set request body with fileSize %o', this._fileSize)
}

public getBody (): SupportedBodyTypes {
const body = this._body
if (body == null) {
this.log.trace('body is null')
return body
}
if (!this.isRangeRequest || !this.isValidRangeRequest) {
this.log.trace('returning body unmodified for non-range, or invalid range, request')
return body
}
const byteStart = this.byteStart
const byteEnd = this.byteEnd
const byteSize = this.byteSize
if (byteStart != null || byteEnd != null) {
this.log.trace('returning body with byteStart=%o, byteEnd=%o, byteSize=%o', byteStart, byteEnd, byteSize)
if (body instanceof ReadableStream) {
// stream should already be spliced by `unixfs.cat`
return body
}
return this.getSlicedBody(body)
}

// we should not reach this point, but return body untouched.
this.log.error('returning unmodified body for valid range request')
return body
}

private getSlicedBody <T extends SliceableBody>(body: T): SliceableBody {
if (this.isPrefixLengthRequest) {
this.log.trace('sliced body with byteStart %o', this.byteStart)
return body.slice(this.offset) satisfies SliceableBody
}
if (this.isSuffixLengthRequest && this.length != null) {
this.log.trace('sliced body with length %o', -this.length)
return body.slice(-this.length) satisfies SliceableBody
}
const offset = this.byteStart ?? 0
const length = this.byteEnd == null ? undefined : this.byteEnd + 1
this.log.trace('returning body with offset %o and length %o', offset, length)

return body.slice(offset, length) satisfies SliceableBody
}

private get isSuffixLengthRequest (): boolean {
return this.requestRangeStart == null && this.requestRangeEnd != null
}

private get isPrefixLengthRequest (): boolean {
return this.requestRangeStart != null && this.requestRangeEnd == null
}

// sometimes, we need to set the fileSize explicitly because we can't calculate the size of the body (e.g. for unixfs content where we call .stat)
public set fileSize (size: number | bigint | null) {
this._fileSize = size != null ? Number(size) : null
this.log.trace('set _fileSize to %o', this._fileSize)
// when fileSize changes, we need to recalculate the offset details
this.setOffsetDetails()
}

public get fileSize (): number | null | undefined {
return this._fileSize
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this class is just used internally, do we have to protect ourselves from ourselves with read-only properties like this?

Copy link
Member Author

@SgtPooki SgtPooki Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achingbrain

We don't have to, but it will help prevent future developers from mistakenly trying to set things they shouldn't without understanding the implications. Setters and getters help explicitly state, in the way the code was implemented, that specific things have to be done when a value is applied, whereas a public non-readonly fileSize implies that it can be called at any time.

e.g. we call the setOffsets when setting fileSize. Without the property being protected, it can be called by anyone in the codebase. When this property isn't wrapped with a getter, and is public, it becomes available as a setter externally. When in reality, the fileSize property should not be publicly writeable, but to make it public and remove setter/getter, we have to make it so. We could use readonly and then override protections in the class.. but that feels like a lot of work for encapsulating that protection that we get by doing the conventional thing.

aside: IMHO setting and getting are cleaner than "setFileSize" and ".filesize" or "getFilesize"

Also... microsoft/TypeScript#37487

I addressed this with 46dc133 (#10), but we should think about reverting it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it makes sense for fileSize, that change should probably be reverted. What about _isRangeRequest? It's a readonly boolean that's returned from a getter for isRangeRequest.

My concern with getters/setters is that they usually start off with good intentions around data access, but the temptation is always there to do business-logic things in them, which means accessing properties can mutate state which in the long term can make codebases very hard to reason about.


public get isRangeRequest (): boolean {
return this._isRangeRequest
}

private isValidByteStart (): boolean {
if (this.byteStart != null) {
if (this.byteStart < 0) {
return false
}
if (this.fileSize != null && this.byteStart > this.fileSize) {
return false
}
}
return true
}

private isValidByteEnd (): boolean {
if (this.byteEnd != null) {
if (this.byteEnd < 0) {
return false
}
if (this.fileSize != null && this.byteEnd > this.fileSize) {
return false
}
}
return true
}

public set isValidRangeRequest (val: boolean) {
this._isValidRangeRequest = val
}

public get isValidRangeRequest (): boolean {
if (!this.isValidByteStart()) {
this.log.trace('invalid range request, byteStart is less than 0 or greater than fileSize')
this._isValidRangeRequest = false
} else if (!this.isValidByteEnd()) {
this.log.trace('invalid range request, byteEnd is less than 0 or greater than fileSize')
this._isValidRangeRequest = false
} else if (this.requestRangeEnd != null && this.requestRangeStart != null) {
// we may not have enough info.. base check on requested bytes
if (this.requestRangeStart > this.requestRangeEnd) {
this.log.trace('invalid range request, start is greater than end')
this._isValidRangeRequest = false
} else if (this.requestRangeStart < 0) {
this.log.trace('invalid range request, start is less than 0')
this._isValidRangeRequest = false
} else if (this.requestRangeEnd < 0) {
this.log.trace('invalid range request, end is less than 0')
this._isValidRangeRequest = false
}
}
this._isValidRangeRequest = this._isValidRangeRequest ?? true

return this._isValidRangeRequest
}

/**
* Given all the information we have, this function returns the offset that will be used when:
* 1. calling unixfs.cat
* 2. slicing the body
*/
public get offset (): number {
if (this.byteStart === 0) {
return 0
}
if (this.isPrefixLengthRequest || this.isSuffixLengthRequest) {
if (this.byteStart != null) {
// we have to subtract by 1 because the offset is inclusive
return this.byteStart - 1
}
}

return this.byteStart ?? 0
}

/**
* Given all the information we have, this function returns the length that will be used when:
* 1. calling unixfs.cat
* 2. slicing the body
*/
public get length (): number | undefined {
return this.byteSize ?? undefined
}

/**
* Converts a range request header into helia/unixfs supported range options
* Note that the gateway specification says we "MAY" support multiple ranges (https://specs.ipfs.tech/http-gateways/path-gateway/#range-request-header) but we don't
*
* Also note that @helia/unixfs and ipfs-unixfs-exporter expect length and offset to be numbers, the range header is a string, and the size of the resource is likely a bigint.
*
* SUPPORTED:
* Range: bytes=<range-start>-<range-end>
* Range: bytes=<range-start>-
* Range: bytes=-<suffix-length> // must pass size so we can calculate the offset. suffix-length is the number of bytes from the end of the file.
*
* NOT SUPPORTED:
* Range: bytes=<range-start>-<range-end>, <range-start>-<range-end>
* Range: bytes=<range-start>-<range-end>, <range-start>-<range-end>, <range-start>-<range-end>
*
* @see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range#directives
*/
private setOffsetDetails (): void {
if (this.requestRangeStart == null && this.requestRangeEnd == null) {
this.log.trace('requestRangeStart and requestRangeEnd are null')
return
}

const { start, end, byteSize } = calculateByteRangeIndexes(this.requestRangeStart ?? undefined, this.requestRangeEnd ?? undefined, this._fileSize ?? undefined)
this.log.trace('set byteStart to %o, byteEnd to %o, byteSize to %o', start, end, byteSize)
this.byteStart = start
this.byteEnd = end
this.byteSize = byteSize
}

/**
* This function returns the value of the "content-range" header.
* Content-Range: <unit> <range-start>-<range-end>/<size>
* Content-Range: <unit> <range-start>-<range-end>/*
* Content-Range: <unit> */<size>
* @see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Range
*/
public get contentRangeHeaderValue (): string {
if (this._contentRangeHeaderValue != null) {
return this._contentRangeHeaderValue
}
if (!this.isValidRangeRequest) {
this.log.error('cannot get contentRangeHeaderValue for invalid range request')
throw new Error('Invalid range request')
}

return getContentRangeHeader({
byteStart: this.byteStart,
byteEnd: this.byteEnd,
byteSize: this._fileSize ?? undefined
})
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ export async function getStreamFromAsyncIterable (iterator: AsyncIterable<Uint8A
const { value: firstChunk, done } = await reader.next()

if (done === true) {
log.error('No content found for path', path)
log.error('no content found for path', path)
throw new Error('No content found')
}

Expand Down
10 changes: 5 additions & 5 deletions packages/verified-fetch/src/utils/parse-url-string.ts
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ export async function parseUrlString ({ urlString, ipns, logger }: ParseUrlStrin
log.trace('resolved %s to %c from cache', cidOrPeerIdOrDnsLink, cid)
} else {
// protocol is ipns
log.trace('Attempting to resolve PeerId for %s', cidOrPeerIdOrDnsLink)
log.trace('attempting to resolve PeerId for %s', cidOrPeerIdOrDnsLink)
let peerId = null

try {
Expand All @@ -90,16 +90,16 @@ export async function parseUrlString ({ urlString, ipns, logger }: ParseUrlStrin
ipnsCache.set(cidOrPeerIdOrDnsLink, resolveResult, 60 * 1000 * 2)
} catch (err) {
if (peerId == null) {
log.error('Could not parse PeerId string "%s"', cidOrPeerIdOrDnsLink, err)
log.error('could not parse PeerId string "%s"', cidOrPeerIdOrDnsLink, err)
errors.push(new TypeError(`Could not parse PeerId in ipns url "${cidOrPeerIdOrDnsLink}", ${(err as Error).message}`))
} else {
log.error('Could not resolve PeerId %c', peerId, err)
log.error('could not resolve PeerId %c', peerId, err)
errors.push(new TypeError(`Could not resolve PeerId "${cidOrPeerIdOrDnsLink}", ${(err as Error).message}`))
}
}

if (cid == null) {
log.trace('Attempting to resolve DNSLink for %s', cidOrPeerIdOrDnsLink)
log.trace('attempting to resolve DNSLink for %s', cidOrPeerIdOrDnsLink)

try {
resolveResult = await ipns.resolveDns(cidOrPeerIdOrDnsLink, { onProgress: options?.onProgress })
Expand All @@ -108,7 +108,7 @@ export async function parseUrlString ({ urlString, ipns, logger }: ParseUrlStrin
log.trace('resolved %s to %c', cidOrPeerIdOrDnsLink, cid)
ipnsCache.set(cidOrPeerIdOrDnsLink, resolveResult, 60 * 1000 * 2)
} catch (err: any) {
log.error('Could not resolve DnsLink for "%s"', cidOrPeerIdOrDnsLink, err)
log.error('could not resolve DnsLink for "%s"', cidOrPeerIdOrDnsLink, err)
errors.push(err)
}
}
Expand Down
51 changes: 51 additions & 0 deletions packages/verified-fetch/src/utils/request-headers.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
export function getHeader (headers: HeadersInit | undefined, header: string): string | undefined {
SgtPooki marked this conversation as resolved.
Show resolved Hide resolved
if (headers == null) {
return undefined
}
if (headers instanceof Headers) {
return headers.get(header) ?? undefined
}
if (Array.isArray(headers)) {
const entry = headers.find(([key]) => key.toLowerCase() === header.toLowerCase())
return entry?.[1]
}
const key = Object.keys(headers).find(k => k.toLowerCase() === header.toLowerCase())
if (key == null) {
return undefined
}

return headers[key]
}

/**
* Given two ints from a Range header, and potential fileSize, returns:
* 1. number of bytes the response should contain.
* 2. the start index of the range. // inclusive
* 3. the end index of the range. // inclusive
*/
export function calculateByteRangeIndexes (start: number | undefined, end: number | undefined, fileSize?: number): { byteSize?: number, start?: number, end?: number } {
if (start != null && end != null) {
if (start > end) {
throw new Error('Invalid range')
}

return { byteSize: end - start + 1, start, end }
} else if (start == null && end != null) {
// suffix byte range requested
if (fileSize == null) {
return { end }
}
const result = { byteSize: end, start: fileSize - end + 1, end: fileSize }
return result
} else if (start != null && end == null) {
if (fileSize == null) {
return { start }
}
const byteSize = fileSize - start + 1
const end = fileSize
return { byteSize, start, end }
}

// both start and end are undefined
return { byteSize: fileSize }
}
Loading
Loading