-
-
Notifications
You must be signed in to change notification settings - Fork 30
Plans
Often we'll need to use a model with different variable names than whatever it comes with. So something like
julia> m = @model begin
μ ~ Normal()
x ~ Normal(μ, 1)
end
julia> rename(m, :μ => :mean)
@model begin
mean ~ Normal()
x ~ Normal(μ, 1)
end
This should be straightforward.
Consider a nested modeling setup, like
prior = @model begin
λ ~ HalfCauchy()
a ~ Exponential(λ)
b ~ Exponential(λ)
end
m = @model begin
pars ~ prior()
x ~ Beta(pars.a, pars.b)
end
[This example is a little contrived; we need a better one. But we have run into this]
Now, suppose x
and prior.a
, and want to sample the remaining variables conditional on these values.
Currently, there's not a nice way to do this. Model conditioning doesn't compose nicely (see Rainforth, 2018 for details). Instead, we can unroll the nesting and then proceed as usual:
unroll(m,:pars) == @model begin
pars ~ @struct begin
pars_λ ~ HalfCauchy()
pars_a ~ Exponential(λ)
pars_b ~ Exponential(λ)
end
x ~ Beta(pars.a, pars.b)
end
The notation isn't settled yet and we need to think through the semantics of this. The point is, the result should be one model where we can condition on any subset of variables we like.
SymPy lacks array support and forces a Python dependency that slows down precompilation. We should transition to SymbolicUtils.
Missing data problems are very common. Many PPLs handle this, but most (all?) assume MCAR, which is a very special case.
We should model missing data by explicitly representing the missingness process, and have convenient combinators for simple cases like MCAR.
Many of the limitations of Soss are inherited from Distributions.jl. Addressing these is one of the main goals of Measures.jl.
Soss currently has a return
option, but we're not yet doing much with it. Some of this is complicated a bit by the semantics of existing rand
and logpdf
, which implicitly disallow latent variables. It's not usually stated, but these usually obey a law like
hasmethod(logpdf, dist, typeof(rand(dist)))
Now, say we have something like
julia> m = @model begin
μ ~ Normal()
x ~ Normal(μ, 1)
return x
end
so rand(m())
just returns a Float64
. By the law above, we should be able to do something like
logpdf(m(), 0.2)
But now we're in a bad situation, because of other assumed properties (again, not usually stated) of rand
and logpdf
:
-
rand
should do "forward sampling" and produce "independent" samples (PRNG concerns aside) -
logpdf
should be precise (not sampling-based) and fast (not requiring integration)
These functions are constrained because of semantics inherited from Base and Distributions, respectively. To work around this without breaking the semantics, we should have new functions:
-
sample
will be likerand
, but may draw using MCMC -
logdensity
will be likelogpdf
, but will require assignment of all latent variables needed for computation.
This way, logpdf
can retain its current semantics.
To allow inference methods like those in Gen, we need variants of combinators like For
to include representation of the logdensity
contribution of every component. We can map to this type statically before inference as needed.
PPLs generally have a hard time with observing values that are specified deterministically. But in some cases this should not be a problem. For example, in
m = @model begin
a ~ HalfNormal()
x = log(a)
end
observing x = 0
is equivalent to observing a = 1
.
In general, this kind of trick works for any value defined as a bijection to a set of sampled values.
There's lots more we need to do in Measures.jl, all of which can support Soss (and then Delve).
- Product Measure
- Superposition
- Mixture distributions
- Pushforward measure (special case of mixture)