Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observations should be independent #76

Closed
tkoskela opened this issue May 27, 2020 · 22 comments · Fixed by #79
Closed

Observations should be independent #76

tkoskela opened this issue May 27, 2020 · 22 comments · Fixed by #79

Comments

@tkoskela
Copy link
Member

Remove cov_obs, get weights by multiplying independent likelihoods drawn from observations.

@tkoskela
Copy link
Member Author

Observation follows normal distribution with mean = height and variance = observation noise

@AlexandrosBeskos
Copy link
Member

Some issues:

  • Observations are independent, and the log-weight for particle X[i], i=1,2..., should simply be:
    logw[i] := \sum_{stations}[ -(Y[station]-X[i,station])^2/(2*obs_noice_amplitude) ]

  • Notice that, following standard practice, I calculate the log-weights; then, to change into weights and apply resampling etc., following again standard practice, one finds
    M = max_i (logw[i]), and then carries out the calculations:
    a) w[i] =. exp(.logw[i] - M ) ----> so the largest w is exactly equal to 1.
    b) Standardise, to sum up to one, be setting W[i] = w[i] / \sum(w[i], i=1,2,)
    Resampling is now carried out with the weights W[i], i=1,2,...

@AlexandrosBeskos
Copy link
Member

rr, inv_rr are obsolete.

@tkoskela
Copy link
Member Author

This could be achieved by just replacing cov_obs with I * obs_noise_amplitude, right?

@AlexandrosBeskos
Copy link
Member

  1. Just be careful not to confuse variance with standard deviation.
    "Amplitude" is a bit ambiguous for a statistician: I would use obs_noise_std (with std standing for standard deviation).
  2. Also please have a thought about computational cost, as in the case of general cov_obs matrix one has to to make matrix/vector calculations, whereas in the case where we have independent observations there is no need to use matrices/vectors, i.e. the calculation should be super-fast.

@tkoskela
Copy link
Member Author

Well, generally speaking matrix/vector calculations are the fastest possible calculations you can do on CPUs, so independent observations are not necessarily faster. I will test this and give some numbers.

@tkoskela
Copy link
Member Author

  1. Just be careful not to confuse variance with standard deviation.
    "Amplitude" is a bit ambiguous for a statistician: I would use obs_noise_std (with std standing for standard deviation).

Just making sure I understand. Do you then want the observation noise to drawn from a normal distribution with mean = 0, std = obs_noise_std? At the moment this is not the case, but rather it's obs_noise_amplitude * Normal(mean=0, std=1)

https://github.com/Team-RADDISH/TDAC.jl/blob/da97941defab0717a846cec0800644ff0c4f35ce/src/TDAC.jl#L272

@tkoskela
Copy link
Member Author

This could be achieved by just replacing cov_obs with I * obs_noise_amplitude, right?

I am mostly advocating this for two reasons:

  1. It requires very small changes in the code
  2. It allows us to keep leveraging the Distributions.jl package very neatly

https://github.com/Team-RADDISH/TDAC.jl/blob/da97941defab0717a846cec0800644ff0c4f35ce/src/TDAC.jl#L127

(ignore my comment there, it does work 😊)

I tested replacing

https://github.com/Team-RADDISH/TDAC.jl/blob/da97941defab0717a846cec0800644ff0c4f35ce/src/TDAC.jl#L576

with

cov_obs = Diagonal(ones(params.nobs) * params.obs_noise_amplitude)

The calculation of weights using the parameters from #74 sped up from 82 microseconds to 30 microseconds, and in both cases is an insignificant fraction of the total run time. I'll try implementing it as a loop sampling from independent normal distributions to see if it becomes much faster.

@tkoskela
Copy link
Member Author

tkoskela commented May 27, 2020

The answer to the question of which way is faster is, it depends on the number of particles (and does not matter much either way).

Here is a test using 100 particles:

 ────────────────────────────────────────────────────────────────────────────────
                                         Time                   Allocations      
                                 ──────────────────────   ───────────────────────
        Tot / % measured:             14.9s / 100%             148MiB / 100%     

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────
 Weights Diagonal            10   46.3μs  0.00%  4.63μs   20.0KiB  0.07%  2.00KiB
 Weights Independent         10   31.8μs  0.00%  3.18μs     0.00B  0.00%    0.00B

and here is a test using 1000 particles:

 ────────────────────────────────────────────────────────────────────────────────
                                         Time                   Allocations      
                                 ──────────────────────   ───────────────────────
        Tot / % measured:             14.9s / 100%             148MiB / 100%     

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────────
 Weights Independent         10    216μs  0.00%  21.6μs     0.00B  0.00%    0.00B
 Weights Diagonal            10    185μs  0.00%  18.5μs    161KiB  0.11%  16.1KiB

I labeled the way I suggested above Weights Diagonal. The function labeled Weights Independent looks like this:

function get_weights!(weight::AbstractVector{T},
                      obs::AbstractVector{T},
                      obs_model::AbstractMatrix{T},
                      obs_noise_std::T) where T

    nobs = size(obs_model,1)    
    nprt = size(obs_model,2)
    @assert size(obs,1) == nobs
    @assert size(weight,1) == nprt

    weight .= 1.0
    
    for iobs = 1:nobs
        for iprt = 1:nprt
            weight[iprt] += logpdf(Normal(obs[iobs], obs_noise_std), obs_model[iobs,iprt])
        end
    end

    @. weight = exp(weight)
    weight ./= sum(weight)
    
end

@tkoskela
Copy link
Member Author

I will leave both methods in the code, in case weight calculation ever becomes a bottleneck.

@AlexandrosBeskos
Copy link
Member

Well, generally speaking matrix/vector calculations are the fastest possible calculations you can do on CPUs, so independent observations are not necessarily faster. I will test this and give some numbers.

Well, I think there are some matrix inversions also involved, so as long as the system understands it is dealing with a diagonal matrix (so such inversions are trivial) then the two approaches should be fairly similar.

@AlexandrosBeskos
Copy link
Member

  1. Just be careful not to confuse variance with standard deviation.
    "Amplitude" is a bit ambiguous for a statistician: I would use obs_noise_std (with std standing for standard deviation).

Just making sure I understand. Do you then want the observation noise to drawn from a normal distribution with mean = 0, std = obs_noise_std? At the moment this is not the case, but rather it's obs_noise_amplitude * Normal(mean=0, std=1)

https://github.com/Team-RADDISH/TDAC.jl/blob/da97941defab0717a846cec0800644ff0c4f35ce/src/TDAC.jl#L272

The two things you mention are equivalent, so no problem with any of them.
I mean that X=N(mean=0, std=s) is equivalent to X = s*N(mean=0, std=1).

@AlexandrosBeskos
Copy link
Member

I will leave both methods in the code, in case weight calculation ever becomes a bottleneck.

Yes, that is absolutely fine.

@AlexandrosBeskos
Copy link
Member

@. weight = exp(weight)
weight ./= sum(weight)

At exactly this part of the code - the exponentiation of the log-weights - it is a standard practice in the area to apply the following to avoid numerical instabilities:

a) get log-weights for all particles.

b) adding a constant to all these log-weights does not affect the algorithm, as the effect of the constant will disappear upon exponentiating & standardising the weights (to sum up to one). But it can have an effect in terms of avoiding numerical instabilities.
So, people make an adjustment at this point:

log-weights = log-weights - max(log-weights)

c) It is after this that one exponentiates:

weights = exp(log-weights),

and then standardises,

weights = weights/sum(weights).

@tkoskela
Copy link
Member Author

Thanks for the explanation, I did not really understand why you wanted to add a constant in your original message

@AlexandrosBeskos
Copy link
Member

Is the problem you worry about with taking the exp of a very small value? In julia this does not seem to be a problem, it will just return 0 when the value is small enough

julia> exp(-1e1)
4.539992976248485e-5

julia> exp(-1e2)
3.720075976020836e-44

julia> exp(-1e3)
0.0

julia> exp(-Inf)
0.0

Well, partly this is exactly what people want to avoid: i.e. weights correspond to likelihoods, and there could be scenarios well all weights are exp(-large) which the system will set equal to 0, and that is not good.
This is what the subtraction of the maximum log-weight avoids: to be in a position when all weights become zeros due to numerical precision.

@giordano
Copy link
Member

@AlexandrosBeskos
Copy link
Member

Yes, this seems to be trying to do the same thing, though is seems a bit either overly-complicated or sophisticated. I think a simple action as the one I have described is the one that everybody is using as a standard in my field.

@tkoskela
Copy link
Member Author

tkoskela commented May 28, 2020

That function calculates a log-sum at the end which, as far as I understood, we don't want. I added a function to the PR that does the offset, thanks again for explaining Alex

https://github.com/Team-RADDISH/TDAC.jl/blob/17ccdf2d4db44205bda660d31248b8cda20acc84/src/TDAC.jl#L92-L98

@AlexandrosBeskos
Copy link
Member

It could be what is in the reference is already doing exactly as I have described - it is just that I am not familiar with all coding practices in Julia.

@AlexandrosBeskos
Copy link
Member

ok, from my little/intuitive understanding of the "advanced" version of Julia commands that you have used, I would bet that all is fine now.

@tkoskela
Copy link
Member Author

Merged the changes to master in #79

@giordano giordano linked a pull request May 28, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants