Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring CUDA support to Tracking.jl #33

Merged
merged 105 commits into from
Nov 17, 2021
Merged
Show file tree
Hide file tree
Changes from 101 commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
64b0064
initial commit in the gpu branch
Apr 12, 2020
5aad150
GPU function blueprints for downcovert & car. rep.
Apr 25, 2020
4d602f9
Gpu carrier generation draft
coezmaden Apr 27, 2020
f1952f5
downconvert loop blueprint
May 13, 2020
808b7d6
Implement carrier replica gpu function
coezmaden May 13, 2020
d905623
Implement code replica and corr on GPU
May 19, 2020
9cbca18
Delete unnecessary comments
May 23, 2020
c4e1616
Reflect the development in readme
May 23, 2020
a9869a3
Adjust Gain Controlled Signal for a GPU Signal
May 24, 2020
43c07fd
Fix constructor, make use of vector operations
May 24, 2020
186acde
Merge pull request #1 from JuliaGNSS/master
coezmaden May 26, 2020
0c0b49e
Remove .vscode garbage, adjust global gitignore
May 31, 2020
4217d83
Change Float16 occurances to Float32 due to perf.
May 31, 2020
ad2b607
Change GPU carrier_replica to StructArray, adjust the calculation method
Jun 1, 2020
82bcf0f
Merge pull request #2 from ozmaden/master
coezmaden Jun 1, 2020
ecc981e
Complete the GPU downconvert function
Jun 1, 2020
c8fc7a4
Add CUDA package dependency
coezmaden Jun 23, 2020
80e11e4
GPU correlation anc code replica blueprint
Jun 25, 2020
b8ae78b
Merge branch 'feature/gpu-accelerated-correlation' of https://github.…
Jun 25, 2020
4cb1e05
Fix syntax
coezmaden Jun 25, 2020
e263962
Correlation using dot product on GPU
coezmaden Jun 26, 2020
269e956
Fix GPU correlate parameter types
coezmaden Jun 26, 2020
9464364
Add new dependencies
coezmaden Jun 26, 2020
2927448
Fix algorithm, use @views macro for performance
coezmaden Jun 26, 2020
ea93620
Optimize carrier generation by taking fewer steps
coezmaden Jun 29, 2020
7cdacfa
Union types for the carrier and code
coezmaden Jun 29, 2020
a5b8c31
Gain control for the gpu signal implemented
coezmaden Jun 30, 2020
629b96b
GPU code replica, GNSSSignals 28d32c4324e40a0b93391b06820deea98112a02d
Jul 2, 2020
16e69ed
Merge pull request #3 from JuliaGNSS/master
coezmaden Jul 31, 2020
eae463d
Merge pull request #4 from ozmaden/master
coezmaden Jul 31, 2020
35d35d0
Merge pull request #5 from JuliaGNSS/master
coezmaden Oct 22, 2020
4f3120c
Add functions from ozmaden/GNSSBenchmarks.jl
coezmaden Nov 13, 2020
b1d7b56
Reflect changes under GNSSSignals#feature/gpu
coezmaden Dec 19, 2020
e0fc930
Functioning GPU TrackingState
Dec 20, 2020
7c55ee5
account for CPU TrackingState
Dec 20, 2020
15aa324
Reflect GNSSSignals changes for tracking_loop
Dec 20, 2020
22a40ba
Update README for the GPSL1 struct change
Dec 20, 2020
a5aac85
Enforce AbstractArray
Dec 20, 2020
b9a40bd
AGC for CUDA signals
Dec 20, 2020
dc8d766
functioning GPU tracking loop
Dec 20, 2020
26db618
rectify start_sample
Dec 20, 2020
92ec34a
Fix resize problems
Dec 21, 2020
1eceefe
Stylistic change, variable names small letters
Dec 21, 2020
dfced23
Replace mutiple function calls with a variable
Dec 21, 2020
1cf107c
Remove conditional use_gpu flag, as it's taken care of in GNSSSignals.jl
Dec 21, 2020
7a0b6cf
cleanup residual errors
Dec 21, 2020
36cdd2c
Merge pull request #7 from JuliaGNSS/master
coezmaden Dec 25, 2020
5f80968
Merge branch 'feature/gpu-accelerated-correlation' into master
coezmaden Dec 25, 2020
d111f2a
Merge pull request #8 from ozmaden/master
coezmaden Dec 25, 2020
9ffc0f6
Fix tracking_loop trunc inexact error
Jan 5, 2021
4652695
Fix CPU tracking loop
Jan 5, 2021
5904d04
Implement GPU StructArray gen_carrier_replica!
coezmaden Jan 12, 2021
7dade1f
Implement GPU StructArray correlate
coezmaden Jan 13, 2021
5e52117
Implement GPU StructArray downconvert!
coezmaden Jan 13, 2021
a1b3df8
Allow for both CuArray and StructArray of CuArrays tracking loop
coezmaden Jan 15, 2021
3b53bd8
Performance improvement for the CuArray correlator, implement dot pro…
coezmaden Jan 16, 2021
bff0e07
Performance improvements for the StructArray of CuArrays correlate, i…
coezmaden Jan 16, 2021
5dd429a
Merge branch 'feature/gpu-accelerated-correlation' of github.com:ozma…
coezmaden Jan 16, 2021
149e697
Performance improvement for the CuArray correlate
coezmaden Jan 16, 2021
c2e1a34
Create match_size_to_signal! function that checks if resizing is need…
coezmaden Jan 16, 2021
c21f3c1
Delete extra match_size_to_signal! definitions, fix dot products, imp…
coezmaden Jan 20, 2021
39e0424
Remove Loop Vectorization compat
coezmaden Jul 16, 2021
9c91eb7
Merge branch 'master' of https://github.com/JuliaGNSS/Tracking.jl int…
coezmaden Jul 22, 2021
a719cc2
Merge branch 'JuliaGNSS-master' into feature/gpu_kernels
coezmaden Jul 22, 2021
8d30a28
Reflect changes in JuliaGNSS:master
coezmaden Jul 22, 2021
fa580d3
GPU TrackingState, DownconvertedSignalGPU, CarrierReplicaGPU
coezmaden Jul 29, 2021
ad0f826
GPU tracking state initializes iff signal is known
coezmaden Oct 22, 2021
2c38ed0
GPU Tracking State doesn't need code, insert the main kernel
coezmaden Oct 24, 2021
7fcc4a9
Merge pull request #13 from JuliaGNSS/master
coezmaden Oct 24, 2021
4023b64
Fix phase error in kernel; kernel works for start:end signal; Trackin…
coezmaden Oct 28, 2021
92f7bb2
Checks for type equality of system.codes and signal, signal structarr…
coezmaden Oct 28, 2021
a0e912a
GPU TrackingState testset
coezmaden Oct 28, 2021
345e03f
GPU tracking results testset
coezmaden Oct 28, 2021
7ccdb91
GPU tracking_loop testset, add CUDA to test name
coezmaden Oct 29, 2021
93ce7a3
GPU bit detector testset
coezmaden Oct 29, 2021
911c04d
GPU GPSL5 testset
coezmaden Oct 29, 2021
468c9b0
GPU GPSL1 testset
coezmaden Oct 29, 2021
9646328
GPU GalileoE1B testset
coezmaden Oct 29, 2021
7b60fe0
GPU discriminators testset
coezmaden Oct 29, 2021
e80b4c0
GPU CN0 estimation testset
coezmaden Oct 29, 2021
480f283
GPU BOC testset
coezmaden Oct 29, 2021
6256c92
GPU bit buffer testset
coezmaden Oct 29, 2021
002dd2d
Fix phase calculation (multiples of 2pi)
coezmaden Nov 7, 2021
c9410ef
Add CUDA tests to runtests includes
coezmaden Nov 7, 2021
c7b31b6
Allow scalar indexing for cn0_estimation test
coezmaden Nov 7, 2021
2bd5482
Allowscalar deprecation
coezmaden Nov 10, 2021
0ce8e02
Solve scalar indexing in accumaltor results
coezmaden Nov 10, 2021
363159e
Fix GPU multi antenna tracking state
coezmaden Nov 10, 2021
9d278e0
Seperate functions for matrix and vector cases
coezmaden Nov 14, 2021
b4d66c1
Allowscalar for tracking loop tests
coezmaden Nov 14, 2021
10d7960
Remove CUDA broadcasting functions, clean comments
coezmaden Nov 14, 2021
63d21d2
Update readme with a `CUDA.jl` example
coezmaden Nov 14, 2021
465b422
Merge ozmaden/Tracking#14
coezmaden Nov 14, 2021
2c235dc
Merge branch 'master' of git://github.com/JuliaGNSS/Tracking.jl into …
coezmaden Nov 14, 2021
504afbb
Merge branch 'JuliaGNSS-master' into feature/gpu_kernels
coezmaden Nov 14, 2021
e5a8398
Adjust GPU functions according to the change https://github.com/Julia…
coezmaden Nov 14, 2021
eb51208
Make CUDA test names consistent
coezmaden Nov 14, 2021
e45cd77
Add multiple antenna GPU test
coezmaden Nov 14, 2021
8f3464d
Fix examples according to https://github.com/JuliaGNSS/Tracking.jl/pu…
coezmaden Nov 14, 2021
90358a9
Check for signal and codes type consistency
coezmaden Nov 14, 2021
81d94c3
Add Julia BuildKite CI for CUDA tests
Nov 16, 2021
ccb75f0
Remove leftovers
coezmaden Nov 17, 2021
2a24f7d
Remove unnecessary structs
coezmaden Nov 17, 2021
eab881c
Remove the unnecessary carrier vector
coezmaden Nov 17, 2021
724dcf6
Remove unused functions and duplicates
coezmaden Nov 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
env:
SECRET_CODECOV_TOKEN: "Q3fuMdJjaQy9h/uk43rwSqz8M6ulvlCedU2Ir0S3QLP4t9F8cf7pzrTkX+nVhkGycZ/r5FRtTOwPr445R3wK5v9mEAsJN5GMOgI5w/L8m2XDwLmW3PN8RMno+fm2JVxZyPMNNmIQqbYEmmQcBS6Q3nywW3xi0Cl5umJuwDB+NdOFbpq3wc2wrnbOAbwlBJoCJmlH+F4ncuVY6EMmsgNKAf9RqUNWQxIthG616X1cNwuYEpL4dO/PWY2GMXWXTQ8ndO/713p4b5yIlzDP0mr2MrO+1A5fhgPc7Vr+f9mUlIAx+9AsWQYPrqPTkr2L5+mfaTodVE3u2Cop877WJZQD7w==;U2FsdGVkX1/wk2jzfWlRZ66IWgionQK/5Fu0pg3u0b26hhmmMjAjOklyi7QZKhJHjjt4KjK/dJzhd3eK28S0qQ=="

steps:
- label: "Julia v1.6"
plugins:
- JuliaCI/julia#v1:
version: "1.6"
- JuliaCI/julia-test#v1: ~
- JuliaCI/julia-coverage#v1:
codecov: true
agents:
queue: "juliagpu"
cuda: "*"
if: build.message !~ /\[skip tests\]/
timeout_in_minutes: 60
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ authors = ["Soeren Zorn <soeren.zorn@nav.rwth-aachen.de>"]
version = "0.14.8"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
DocStringExtensions = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
GNSSSignals = "52c80523-2a4e-5c38-8979-05588f836870"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Expand Down
32 changes: 28 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ This implements a basic tracking functionality for GNSS signals. The correlation
* Secondary code detection
* Bit detection
* Phased array tracking
* GPU acceleration (CUDA)

## Getting started

Expand All @@ -25,15 +26,17 @@ pkg> add Tracking
## Usage

```julia
using GNSSSignals
using Tracking
using Tracking: Hz, GPSL1
using Tracking: Hz
carrier_doppler = 1000Hz
code_phase = 50
sampling_frequency = 2.5e6Hz
prn = 1
state = TrackingState(GPSL1, carrier_doppler, code_phase)
results = track(signal, state, prn, sampling_frequency)
next_results = track(next_signal, get_state(results), prn, sampling_frequency)
gpsl1 = GPSL1()
state = TrackingState(prn, gpsl1, carrier_doppler, code_phase)
results = track(signal, state, sampling_frequency)
next_results = track(next_signal, get_state(results), sampling_frequency)
```

If you'd like to track several signals at once (e.g. in the case of phased antenna arrays), you will have to specify the optional parameter `num_ants::NumAnts{N}` and pass a beamforming function to the `track` function:
Expand All @@ -42,3 +45,24 @@ If you'd like to track several signals at once (e.g. in the case of phased anten
state = TrackingState(GPSL1, carrier_doppler, code_phase, num_ants = NumAnts(4)) # 4 antenna channels
results = track(signal, state, prn, sampling_frequency, post_corr_filter = x -> x[1]) # Post corr filter is optional
```

### Usage with `CUDA.jl`
This package supports accelerating the tracking loop by using the GPU. At the moment support is only provided for `CUDA.jl`. If you'd like to use this option, you'd have to opt-in by providing the following argument upon creating an `AbstractGNSS`:
``` julia
gpsl1_gpu = GPSL1(use_gpu = Val(true))
```
Beware that `num_samples` must be provided explicitly upon creating a `TrackingState`:
``` julia
state_gpu = TrackingState(prn, gpsl1_gpu, carrier_doppler, code_phase, num_samples = N)
```
Moreover, your signal must be a `StructArray{ComplexF32}` of `CuArray{Float32}` type:
``` julia
using StructArrays
signal_cu = CuArray{ComplexF32}(signal_cpu)
signal_gpu = StructArray(signal_cu)
```
Otherwise the usage is identical to the example provided above, including the case for multi-antenna tracking:
``` julia
results_gpu = track(signal_gpu, state_gpu, sampling_frequency)
next_results_gpu = track(next_signal_gpu, get_state(results_gpu), sampling_frequency)
```
8 changes: 5 additions & 3 deletions src/Tracking.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@ module Tracking
StaticArrays,
TrackingLoopFilters,
StructArrays,
LoopVectorization
LoopVectorization,
CUDA

using Unitful: upreferred, Hz, dBHz, ms
import Base.zero, Base.length, Base.resize!
import Base.zero, Base.length, Base.resize!, LinearAlgebra.dot

export
get_early,
Expand Down Expand Up @@ -48,7 +50,7 @@ module Tracking

struct NumAnts{x}
end

NumAnts(x) = NumAnts{x}()

struct NumAccumulators{x}
Expand Down
29 changes: 29 additions & 0 deletions src/carrier_replica.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
$(SIGNATURES)

Fixed point CPU StructArray carrier replica generation
"""
function gen_carrier_replica!(
carrier_replica::StructArray{Complex{T}},
carrier_frequency,
Expand All @@ -19,6 +24,30 @@ end
"""
$(SIGNATURES)

Floating point CPU StructArray carrier generation
"""
function gen_carrier_replica!(
carrier_replica::StructArray{Complex{T},1,NamedTuple{(:re, :im),Tuple{Array{T,1},Array{T,1}}},Int64},
carrier_frequency,
sampling_frequency,
start_phase,
carrier_amplitude_power::Val{N},
start_sample,
num_samples
) where {
T <: AbstractFloat,
N
}
sample_range = start_sample:num_samples + start_sample - 1
@views @. carrier_replica.re[sample_range] = 2pi * (sample_range) * carrier_frequency / sampling_frequency + start_phase
@. carrier_replica.im[sample_range] = sin(carrier_replica.re[sample_range])
@. carrier_replica.re[sample_range] = cos(carrier_replica.re[sample_range])
return carrier_replica
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a left over from previous tests. I think this can be removed.

"""
$(SIGNATURES)

Updates the carrier phase.
"""
function update_carrier_phase(
Expand Down
2 changes: 1 addition & 1 deletion src/correlator.jl
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ Get prompt correlator
function get_prompt(correlator::AbstractCorrelator, correlator_sample_shifts)
correlator.accumulators[get_prompt_index(correlator_sample_shifts)]
end

CUDA.dot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's that?

"""
$(SIGNATURES)

Expand Down
128 changes: 127 additions & 1 deletion src/downconvert_and_correlate.jl
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,130 @@ function downconvert_and_correlate(
accumulators_result = complex.(a_re, a_im)
C(map(+, get_accumulators(correlator), accumulators_result))
end
=#
=#

# CUDA Kernel
function downconvert_and_correlate_kernel(
res_re,
res_im,
signal_re,
signal_im,
carrier_re,
carrier_im,
codes,
code_frequency,
correlator_sample_shifts,
carrier_frequency,
sampling_frequency,
start_code_phase,
carrier_phase,
code_length,
prn,
num_samples,
num_ants,
num_corrs
)
cache = @cuDynamicSharedMem(Float32, (2 * blockDim().x, num_ants, num_corrs))
sample_idx = 1 + ((blockIdx().x - 1) * blockDim().x + (threadIdx().x - 1))
antenna_idx = 1 + ((blockIdx().y - 1) * blockDim().y + (threadIdx().y - 1))
corr_idx = 1 + ((blockIdx().z - 1) * blockDim().z + (threadIdx().z - 1))
iq_offset = blockDim().x
cache_index = threadIdx().x - 1

code_phase = accum_re = accum_im = dw_re = dw_im = 0.0f0
mod_floor_code_phase = Int(0)

if sample_idx <= num_samples && antenna_idx <= num_ants && corr_idx <= num_corrs
# generate carrier
carrier_im[sample_idx], carrier_re[sample_idx] = CUDA.sincos(2π * ((sample_idx - 1) * carrier_frequency / sampling_frequency + carrier_phase))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what purpose do you save the sincos result in a vector? Is it for performance improvements in the future? If this is the case, let's dismiss that here to have a clean baseline.


# downconvert with the conjugate of the carrier
dw_re = signal_re[sample_idx, antenna_idx] * carrier_re[sample_idx] + signal_im[sample_idx, antenna_idx] * carrier_im[sample_idx]
dw_im = signal_im[sample_idx, antenna_idx] * carrier_re[sample_idx] - signal_re[sample_idx, antenna_idx] * carrier_im[sample_idx]

# calculate the code phase
code_phase = code_frequency / sampling_frequency * ((sample_idx - 1) + correlator_sample_shifts[corr_idx]) + start_code_phase

# wrap the code phase around the code length e.g. phase = 1024 -> modfloorphase = 1
mod_floor_code_phase = 1 + mod(floor(Int32, code_phase), code_length)

# multiply elementwise with the code
accum_re += codes[mod_floor_code_phase, prn] * dw_re
accum_im += codes[mod_floor_code_phase, prn] * dw_im
end

cache[1 + cache_index + 0 * iq_offset, antenna_idx, corr_idx] = accum_re
cache[1 + cache_index + 1 * iq_offset, antenna_idx, corr_idx] = accum_im

## Reduction
# wait until all the accumulators have done writing the results to the cache
sync_threads()

i::Int = blockDim().x ÷ 2
@inbounds while i != 0
if cache_index < i
cache[1 + cache_index + 0 * iq_offset, antenna_idx, corr_idx] += cache[1 + cache_index + 0 * iq_offset + i, antenna_idx, corr_idx]
cache[1 + cache_index + 1 * iq_offset, antenna_idx, corr_idx] += cache[1 + cache_index + 1 * iq_offset + i, antenna_idx, corr_idx]
end
sync_threads()
i ÷= 2
end

if (threadIdx().x - 1) == 0
res_re[blockIdx().x, antenna_idx, corr_idx] += cache[1 + 0 * iq_offset, antenna_idx, corr_idx]
res_im[blockIdx().x, antenna_idx, corr_idx] += cache[1 + 1 * iq_offset, antenna_idx, corr_idx]
end
return nothing
end

function downconvert_and_correlate_kernel_wrapper(
system,
signal,
correlator,
code_replica,
code_phase,
carrier_replica,
carrier_phase,
downconverted_signal,
code_frequency,
correlator_sample_shifts,
carrier_frequency,
sampling_frequency,
signal_start_sample,
num_samples_left,
prn
)
num_corrs = length(correlator_sample_shifts)
num_ants = size(signal, 2)
num_samples = size(signal, 1)
block_dim_z = num_corrs
block_dim_y = num_ants
# keep num_corrs and num_ants in seperate dimensions, truncate num_samples accordingly to fit
block_dim_x = prevpow(2, 1024 ÷ block_dim_y ÷ block_dim_z)
threads = (block_dim_x, block_dim_y, block_dim_z)
blocks = cld(size(signal, 1), block_dim_x)
res_re = CUDA.zeros(Float32, blocks, block_dim_y, block_dim_z)
res_im = CUDA.zeros(Float32, blocks, block_dim_y, block_dim_z)
shmem_size = sizeof(ComplexF32)*block_dim_x*block_dim_y*block_dim_z
@cuda threads=threads blocks=blocks shmem=shmem_size downconvert_and_correlate_kernel(
res_re,
res_im,
signal.re,
signal.im,
carrier_replica.carrier.re,
carrier_replica.carrier.im,
system.codes,
Float32(code_frequency),
correlator_sample_shifts,
Float32(carrier_frequency),
Float32(sampling_frequency),
Float32(code_phase),
Float32(carrier_phase),
size(system.codes, 1),
prn,
num_samples,
num_ants,
num_corrs
)
return sum(res_re .+ 1im*res_im, dims=1)
end
Loading