Skip to content

Commit

Permalink
[Paper] Fixing a few issues (#153)
Browse files Browse the repository at this point in the history
  • Loading branch information
lrnv authored Feb 13, 2024
1 parent 6d5f5c5 commit 4ae6f36
Showing 1 changed file with 18 additions and 19 deletions.
37 changes: 18 additions & 19 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@ Copulas are functions that describe dependence structures of random vectors, wit

Copulas are standard tools in probability and statistics, with a wide range of applications from biostatistics, finance or medicine, to fuzzy logic, global sensitivity and broader analysis. A few standard theoretical references on the matter are [@joe1997], [@nelsen2006], [@joe2014], and [@durantePrinciplesCopulaTheory2015].

The Julia package `Copulas.jl` brings most standard copula-related features into native Julia: random number generation, density and distribution function evaluations, fitting, construction of multivariate models through Sklar's theorem, and many more related functionalities. Copulas being fundamentally distributions of random vectors, we fully comply with the [`Distributions.jl`](https://github.com/JuliaStats/Distributions.jl) API [@djl1; @djl2], the Julian standard for implementation of random variables and random vectors. This compliance allows interoperability with other packages based on this API such as, e.g., [`Turing.jl`](https://github.com/TuringLang/Turing.jl) [@turing] and several others.
The Julia package `Copulas.jl` brings most standard copula-related features into native Julia: random number generation, density and distribution function evaluations, fitting, construction of multivariate models through Sklar's theorem, and many more related functionalities. Since copulas can combine arbitrary univariate distributions to form distributions of multivariate random vectors, we fully comply with the [`Distributions.jl`](https://github.com/JuliaStats/Distributions.jl) API [@djl1; @djl2], the Julian standard for implementation of random variables and random vectors. This compliance allows interoperability with other packages based on this API such as, e.g., [`Turing.jl`](https://github.com/TuringLang/Turing.jl) [@turing] and several others.

# Statement of need

The R package `copula` [@r_copula_citation1; @r_copula_citation2; @r_copula_citation3; @r_copula_citation4] is the gold standard when it comes to sampling, estimating, or simply working around dependence structures. However, in other languages, the available tools are not as developed and/or not as recognized. We bridge the gap in the Julian ecosystem with this Julia-native implementation. Due to the very flexible type system in Julia, our code expressiveness and tidiness will increase its usability and maintainability in the long-run. Type-stability allows sampling in arbitrary precision without requiring more code, and Julia's multiple dispatch yields most of the below-described applications.
The R package `copula` [@r_copula_citation1; @r_copula_citation2; @r_copula_citation3; @r_copula_citation4] is the gold standard when it comes to sampling, estimating, or simply working around dependence structures. However, in other languages, the available tools are not as developed and/or not as recognized. We bridge the gap in the Julian ecosystem with this Julia-native implementation. Due to the very flexible type system in Julia, our code's expressiveness and tidiness will increase its usability and maintainability in the long-run. Type-stability allows sampling in arbitrary precision without requiring more code, and Julia's multiple dispatch yields most of the below-described applications.

There are competing packages in Julia, such as [`BivariateCopulas.jl`](https://github.com/AnderGray/BivariateCopulas.jl) [@BivariateCopulas] which only deals with a few models in bivariate settings but has very nice graphs, or [`DatagenCopulaBased.jl`](https://github.com/iitis/DatagenCopulaBased.jl) [@DatagenCopulaBased_1; @DatagenCopulaBased_2; @DatagenCopulaBased_3; @DatagenCopulaBased_4], which only provides sampling and does not have exactly the same models as `Copulas.jl`. While not fully covering out both of these package's functionality (mostly because the three projects chose different implementation paths), `Copulas.jl` brings, as a key feature, the compliance with the broader ecosystem. The following table provides a feature comparison between the three:

Expand All @@ -53,13 +53,13 @@ There are competing packages in Julia, such as [`BivariateCopulas.jl`](https://g
| - Obscure Bivariate | Yes | No | No |
| - Archimedean Chains | No | Yes | No |

Since our primary target is maintainability and readability of the implementation, we did not consider the efficiency and the performance of the code yet. However, a (limited in scope) benchmark on Clayton's pdf shows competitive behavior of our implementation w.r.t `DatagenCopulaBased.jl` (but not `BivariateCopulas.jl`). To perform this test we use the [`BenchmarkTools.jl`](https://github.com/JuliaCI/BenchmarkTools.jl) [@BenchmarkTools] package and generate 10^6 samples for Clayton copulas of dimensions 2, 5, 10 with parameter 0.8. The execution times (in seconds) are given below:
Since our primary target is maintainability and readability of the implementation, we have not considered the efficiency and the performance of the code yet. However, a (limited in scope) benchmark on Clayton's `pdf` shows competitive behavior of our implementation w.r.t `DatagenCopulaBased.jl` (but not `BivariateCopulas.jl`). To perform this test we use the [`BenchmarkTools.jl`](https://github.com/JuliaCI/BenchmarkTools.jl) [@BenchmarkTools] package and generate 10^6 samples for Clayton copulas of dimensions 2, 5, 10 with parameter 0.8. The execution times (in seconds) are given below:

| | 2 | 5 | 10 |
|-----------------------------|-----------|-----------|-----------|
| Copulas.Clayton | 1.1495578 | 1.3448951 | 1.8044065 |
| BivariateCopulas.Clayton | 0.1331608 | X | X |
| DatagenCopulaBased.Clayton | 1.9868345 | 2.4276321 | 2.8009263 |
| | 2 | 5 | 10 |
|------------------------------|-----------|-----------|-----------|
| `Copulas.Clayton` | 1.1495578 | 1.3448951 | 1.8044065 |
| `BivariateCopulas.Clayton` | 0.1331608 | X | X |
| `DatagenCopulaBased.Clayton` | 1.9868345 | 2.4276321 | 2.8009263 |

Code for these benchmarks in available in the repository.

Expand All @@ -75,16 +75,15 @@ using Copulas, Distributions, Random
# Define the marginals and the copula, then use Sklar's theorem:
X₁ = Gamma(2,3)
X₂ = Pareto(0.5)
X₃ = Binomial(10,0.8)
X₃ = Normal(10,0.8)
C = ClaytonCopula(3,0.7)
X = SklarDist(C,(X₁,X₂,X₃))
D = SklarDist(C,(X₁,X₂,X₃))

# Sample from the model:
# Sample as follows:
x = rand(D,1000)

# You may estimate the model as follows:
= fit(SklarDist{FrankCopula,Tuple{Gamma,Normal,Binomial}}, x)
# Although you'll probbaly get a bad fit !
= fit(SklarDist{ClaytonCopula,Tuple{Gamma,Pareto, Normal}}, x)
```

The API does not fix the fitting procedure, and only loosely specifies it, thus the implemented default might vary on the copula. If you want more control, you may turn to Bayesian estimation using `Turing.jl`:
Expand All @@ -103,26 +102,26 @@ using Turing
X₂ = Pareto(γ)
X₃ = Binomial(10,η)
C = ClaytonCopula(3,δ)
X = SklarDist(C,(X₁,X₂,X₃))
D = SklarDist(C,(X₁,X₂,X₃))

# Add the loglikelyhood to the model :
# Add the loglikelihood to the model :
Turing.Turing.@addlogprob! loglikelihood(D, dataset)
end
```

## The Archimedean interface

Archimedean copulas are a huge family of copulas that has seen a lot of theoretical work. Among others, you may take a look at [@mcneilMultivariateArchimedeanCopulas2009b]. We use [`WilliamsonTransforms.jl`](https://github.com/lrnv/WilliamsonTransforms.jl/)'s implementation of the Williamson $d$-transfrom to sample from any archimedean copula, including for example the `ClaytonCopula` with negative dependence parameter in any dimension, which is a first to our knowledge.
Archimedean copulas form a large class of copulas that has seen a lot of theoretical work. Among others, you may take a look at [@mcneilMultivariateArchimedeanCopulas2009b]. We use [`WilliamsonTransforms.jl`](https://github.com/lrnv/WilliamsonTransforms.jl/)'s implementation of the Williamson $d$-transfrom to sample from any archimedean copula, including for example the `ClaytonCopula` with negative dependence parameter in any dimension, which is a first to our knowledge.

To construct an archimedean copula, you first need to reference its generator through the following API:

```julia
struct MyGenerator{T} <: Generator
struct MyGenerator{T} <: Copulas.Generator
θ::T
end
ϕ(G::MyGenerator,t) = exp(-G.θ * t) # can you recognise this one ?
max_monotony(G::MyGenerator) = Inf
C = ArchimedeanCopula(d,MyGenerator())
Copulas.max_monotony(G::MyGenerator) = Inf
C = ArchimedeanCopula(4,MyGenerator(1.3)) # 4-dimensional copula
```

The obtained model automatically gets all copula functionalities (pdf, cdf, sampling, dependence measures, etc...). We nevertheless have specific implementation for a (large) list of known generators, and you may implement some other methods if you know closed form formulas for more performance. The use of the (inverse) Williamson d-transform allows the technical boundaries of our Archimedean implementation to *match* the necessary and sufficient conditions for a generator to produce a genuine Archimedean copula.
Expand Down

0 comments on commit 4ae6f36

Please sign in to comment.