Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of dfms: Efficient Dynamic Factor Models for R? #555

Closed
2 of 20 tasks
SebKrantz opened this issue Sep 18, 2022 · 5 comments
Closed
2 of 20 tasks

Review of dfms: Efficient Dynamic Factor Models for R? #555

SebKrantz opened this issue Sep 18, 2022 · 5 comments
Assignees

Comments

@SebKrantz
Copy link

SebKrantz commented Sep 18, 2022

Submitting Author Name: Sebastian Krantz
Submitting Author Github Handle: @SebKrantz
Other Package Authors Github handles: @rbagd
Repository: https://github.com/SebKrantz/dfms
Submission type: Pre-submission
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: dfms
Version: 0.1.1
Title: Dynamic Factor Models
Authors@R: c(person("Sebastian", "Krantz", role = c("aut", "cre"), email = "sebastian.krantz@graduateinstitute.ch"),
             person("Rytis", "Bagdziunas", role = "aut"))
Description: Efficient estimation of Dynamic Factor Models using the Expectation Maximization (EM) algorithm 
  or Two-Step (2S) estimation, on datasets with missing data. The implementation follows advances in the econometric 
  literature: estimation can be done either by running the Kalman Filter and Smoother once with initial values 
  from PCA - following Doz, Giannone and Reichlin (2011) (2S) - or via iterated Kalman Filtering and Smoothing until EM 
  convergence - following Doz, Giannone and Reichlin (2012) - or using the adapted EM algorithm of Banbura and Modugno 
  (2014), allowing estimation with arbitrary patterns of missing data. The implementation makes heavy use of the 
  Armadillo C++ library and the collapse package, providing for particularly speedy estimation. A comprehensive set of 
  methods supports interpretation/visualization of the model and forecasting. Information criteria to choose the number 
  of factors are also provided - following Bai and Ng (2002).
  --- Key References: ---
  Doz, C., Giannone, D., & Reichlin, L. (2011). A two-step estimator for large approximate dynamic 
       factor models based on Kalman filtering. Journal of Econometrics, 164(1), 188-205.
  Doz, C., Giannone, D., & Reichlin, L. (2012). A quasi-maximum likelihood approach for large, approximate 
       dynamic factor models. Review of economics and statistics, 94(4), 1014-1024.
  Banbura, M., & Modugno, M. (2014). Maximum likelihood estimation of factor models on datasets with arbitrary 
       pattern of missing data. Journal of Applied Econometrics, 29(1), 133-160.
URL: https://sebkrantz.github.io/dfms/
BugReports: https://github.com/SebKrantz/dfms/issues
Depends: R (>= 3.0.0)
Imports: Rcpp (>= 1.0.1), collapse (>= 1.8.0)
LinkingTo: Rcpp, RcppArmadillo
Suggests: xts, vars, magrittr, testthat (>= 3.0.0), covr
License: GPL-3
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
RoxygenNote: 7.1.2
Config/testthat/edition: 3

Scope

  • Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check an appropriate box below):

    Data Lifecycle Packages

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis

    Statistical Packages

    • Bayesian and Monte Carlo Routines
    • Dimensionality Reduction, Clustering, and Unsupervised Learning
    • Machine Learning
    • Regression and Supervised Learning
    • Exploratory Data Analysis (EDA) and Summary Statistics
    • Spatial Analyses
    • Time Series Analyses
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

Dynamic factor models are a time series modelling and dimensionality reduction technique.

I'm working on this.

  • Who is the target audience and what are scientific applications of this package?

Anybody working with time series. The package is useful for dimensionality reduction and forecasting with a large amount of time series.

See README.md. In short: dfms is the much faster, provides multiple estimation methods, and has a comprehensive set of methods for exploring the model and forecasting. It is less specialized than economic nowcasting packages.

No Applicable.

  • Any other questions or issues we should be aware of?:

First, I would like to ask if you think you'll be able to review this package in a statistical sense. Then, I will likely not be able to comply with all of your standards, as I intend to export some C++ level helper function (mainly efficient Kalman Filtering and Smoothing functions) without any checks on the inputs. My hope here is in part to provide infrastructure that more specialized software (such as nowcasting packages) can take advantage of. The iterative filtering and smoothing performed in the estimation of dynamic factor models via expectation maximization (EM) algorithms does not square well with R-level checks in those functions (which would be executed many times per second).

@adamhsparks
Copy link
Member

Hi @SebKrantz, thanks for your inquiry.

I've checked with some of the other editors. We flexibility around input checking in low-level R or C code intended for export in other packages, but this package appears have a high-level interface and so that interface should be standards-compliant.

Additionally, some of us have had difficulties installing the package as it currently is written. @mpadge had difficulties installing due to "undefined symbol: dpocon_". Error perfectly repeatable in a rocker/tidyverse container. However, I am able to build it on my M1 MacBook with no errors. This should be addressed before we can proceed.

@SebKrantz
Copy link
Author

Hi @adamhsparks. Thanks for these initial comments. Let me work a bit further on it still to ensure the package passes your checks as good as possible and we can discuss further about it. What I can definitely offer to do it rigorous checking of inputs at the C++ level in the Kalman Filtering functions. I've not done this so far because Armadillo already has build in checks and throws appropriate error messages, making it practically impossible to pass something wrong. But let me see if I can appease your tests.

Regarding the installation issue. I'm sorry for that, it turned out that BLAS and LAPACK libraries were linked through a Makevars.win file, instead of a global Makevars file. I have fixed this now.

@SebKrantz
Copy link
Author

SebKrantz commented Oct 3, 2022

I have now brought dfms to a state where I am happy with it. The software is quite robust and inputs are rigorously checked. I still get some autotest issues, particularly in the main DFM() function, but I don't understand those, as all inputs parameters received the full extent of checking, including for data type, case insensitivity and permissible range.

I have quickly gone through the srr standards, and in my opinion dfms broadly meets most of them. I have made some comments in srr-stats-standards.R below standards where I have done things a bit differently.

Regarding unit testing, there is room for much improvement, in particular one could set up R translations of the authors original Matlab codes (which are provided under misc/) to test against. For now I have manually verified equivalence of dfms to these codes, and test against some hard-coded parameter values.

I general, my time on this package is a bit limited due to more important development commitments that I have, but it would be nice to get some review, and also to be able to release it to CRAN before Christmas. Let me know how you think about this.

@adamhsparks
Copy link
Member

Hi @SebKrantz, good to hear that you've managed to improve the package.

There are no requirements for the package review and CRAN. That is, you can put it on CRAN and have it reviewed asynchronously, they don't affect each other.

We can initiate the review process as soon as you can open a new issue requesting the review. We would strive to have the first reviews done within three weeks of the reviewers being assigned. This of course is flexible to work with the reviewers' own commitments as well. So we might have something by Christmas for the reviews for you, but I'd not be surprised if it slipped a little past that.

@SebKrantz
Copy link
Author

Thanks @adamhsparks for the clarification. I will then open an issue requesting review, and also prepare a first CRAN release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants