Plans

Within Soss

Renaming Variables

Often we'll need to use a model with different variable names than whatever it comes with. So something like

julia> m = @model begin
    μ ~ Normal()
    x ~ Normal(μ, 1)
end

julia> rename(m, :μ => :mean)

@model begin
    mean ~ Normal()
    x ~ Normal(μ, 1)
end

This should be straightforward.

Unrolling Models

Consider a nested modeling setup, like

prior = @model begin
    λ ~ HalfCauchy()
    a ~ Exponential(λ)
    b ~ Exponential(λ)
end

m = @model begin
    pars ~ prior()
    x ~ Beta(pars.a, pars.b)
end

[This example is a little contrived; we need a better one. But we have run into this]

Now, suppose x and prior.a, and want to sample the remaining variables conditional on these values.

Currently, there's not a nice way to do this. Model conditioning doesn't compose nicely (see Rainforth, 2018 for details). Instead, we can unroll the nesting and then proceed as usual:

unroll(m,:pars) == @model begin
    pars ~ @struct begin
        pars_λ ~ HalfCauchy()
        pars_a ~ Exponential(λ)
        pars_b ~ Exponential(λ)
    end

    x ~ Beta(pars.a, pars.b)
end

The notation isn't settled yet and we need to think through the semantics of this. The point is, the result should be one model where we can condition on any subset of variables we like.

SymPy → SymbolicUtils

SymPy lacks array support and forces a Python dependency that slows down precompilation. We should transition to SymbolicUtils.

Missing data

Missing data problems are very common. Many PPLs handle this, but most (all?) assume MCAR, which is a very special case.

We should model missing data by explicitly representing the missingness process, and have convenient combinators for simple cases like MCAR.

Distributions

Many of the limitations of Soss are inherited from Distributions.jl. Addressing these is one of the main goals of Measures.jl.

Return values

Soss currently has a return option, but we're not yet doing much with it. Some of this is complicated a bit by the semantics of existing rand and logpdf, which implicitly disallow latent variables. It's not usually stated, but these usually obey a law like

hasmethod(logpdf, dist, typeof(rand(dist)))

Now, say we have something like

julia> m = @model begin
    μ ~ Normal()
    x ~ Normal(μ, 1)
    return x
end

so rand(m()) just returns a Float64. By the law above, we should be able to do something like

logpdf(m(), 0.2)

But now we're in a bad situation, because of other assumed properties (again, not usually stated) of rand and logpdf:

rand should do "forward sampling" and produce "independent" samples (PRNG concerns aside)
logpdf should be precise (not sampling-based) and fast (not requiring integration)

These functions are constrained because of semantics inherited from Base and Distributions, respectively. To work around this without breaking the semantics, we should have new functions:

sample will be like rand, but may draw using MCMC
logdensity will be like logpdf, but will require assignment of all latent variables needed for computation.

This way, logpdf can retain its current semantics.

Granular logpdfs

To allow inference methods like those in Gen, we need variants of combinators like For to include representation of the logdensity contribution of every component. We can map to this type statically before inference as needed.

Observing Deterministic Variables

PPLs generally have a hard time with observing values that are specified deterministically. But in some cases this should not be a problem. For example, in

m = @model begin
    a ~ HalfNormal()
    x = log(a)
end

observing x = 0 is equivalent to observing a = 1.

In general, this kind of trick works for any value defined as a bijection to a set of sampled values.

Measures.jl

There's lots more we need to do in Measures.jl, all of which can support Soss (and then Delve).

Product Measure
Superposition
Mixture distributions
Pushforward measure (special case of mixture)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly