Distributional copula regression for space-time data
A07 develops novel models for multivariate spatio-temporal data using distributional copula regression. Of particular interest are tests for the significance of predictors and automatic variable selection using Bayesian selection priors. In the long run, the project will consider computationally efficient modeling of non-stationary dependencies using stochastic partial differential equations.
Project Leaders
Prof. Dr. Holger Dette
Faculty of Mathematics - Chair of Stochastics
Ruhr University Bochum
Prof. Dr. Nadja Klein
Department of Informatics - Scientific Computing Center
Karlsruhe Institute of Technology
Summary
Modeling dependencies in space-time data is of interest for several projects of TRR 391 and copulas are an important mathematical tool to capture such potentially complex associations. In this project, we will develop novel models for multivariate spatio-temporal data based on copulas and distributional regression. In particular we leverage the potential of statistical testing and Bayesian shrinkage priors to induce sparse yet flexibly varying dependence structures between multiple outcomes that are observed over space and time. With the help of distributional regression it will be possible to describe the entire conditional distributions - including the dependence structure - as functions of space, time and potentially further covariates. To find a reasonable model we will construct statistical tests to determine the copula specification on the one hand, and complement these on the other hand by automatic variable selection using Bayesian variable selection priors. The latter will be particularly appealing to allow for hierarchical model specifications and modular estimation in potentially high-dimensional spatio-temporal copula regression models. Estimation is planned to be conducted by variational inference and generalized Bayesian methods. In a long-term perspective we will consider modeling the dependence structures non-stationary, handle irregularly observed and missing space-time data and leverage the potential of deep learning methods to capture high-dimensional interactions of the joint covariate, space and time domains more thoroughly.
Abramovich, F., I. de Feis, and T. Sapatinas (2009). Optimal testing for additivity in multiple nonparametric regression. Annals of the Institute of Statistical Mathematics 61, 691–714. doi: 10.1007/s10463-007-0164-y.
Acar, E. F., R. V. Craiu, and F. Yao (2011). Dependence Calibration in Conditional Copulas: A Nonparametric Approach. Biometrics 67, 445–453. doi: 10.1111/j.1541-0420.2010.01472.x.
Amato, F., F. Guignard, S. Robert, and M. Kanevski (2020). A novel framework for spatio-temporal prediction of environmental data using deep learning. Nature, Scientific Reports 10, 22243. doi: 10.1038/s41598-020-79148-7.
Ando, T. (2010). Bayesian model selection and statistical modeling. 1st ed. Chapman and Hall/CRC. doi: 10.1201/EBK1439836149.
Bach, P. and N. Klein (2022). Posterior Concentration Rates for Bayesian Penalized Splines.
Banerjee, S. (2017). High-dimensional Bayesian geostatistics. Bayesian Analysis 12, 583–614. doi: 10.1214/17-BA1056R.
Berg, D. (2009). Copula goodness-of-fit testing: An overview and power comparison. The European Journal of Finance 15, 675–701. doi: 10.1080/13518470802697428.
Berghaus, B., A. Bücher, and H. Dette (2013). Minimum distance estimators of the Pickands dependence function and related tests of multivariate extreme-value dependence. Journal de la société française de statistique 154, 116–137.
Birr, S., S. Volgushev, T. Kley, H. Dette, et al. (2017). Quantile Spectral Analysis for Locally Stationary Time Series. Journal of the Royal Statistical Society Series B: Statistical Methodology 79, 1619–1643. doi: 10.1111/rssb.12231.
Bitto, A. and S. Frühwirth-Schnatter (2019). Achieving shrinkage in a time-varying parameter model framework. Journal of Econometrics 210, 75–97. doi: 10.1016/j.jeconom.2018.11.006.
Blei, D. M., A. Kucukelbir, and J. D. McAuliffe (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association 112, 859–877. doi: 10.1080/01621459.2017.1285773.
Bücher, A. and H. Dette (2010). Some comments on goodness-of-fit tests for the parametric form of the copula based on L2-distances. Journal of Multivariate Analysis 101, 749–763. doi: 10.1016/j.jmva.2009.09.014.
Bücher, A., H. Dette, and F. Heinrichs (2021). Are deviations in a gradually varying mean relevant? A testing approach based on sup-norm estimators. The Annals of Statistics 49, 3583–3617. doi: 10.1214/21-AOS2098.
Bücher, A., H. Dette, and S. Volgushev (2011). New estimators of the Pickands dependence function and a test for extreme-value dependence. The Annals of Statistics 39. doi: 10.1214/11-AOS890.
Bücher, A., H. Dette, and S. Volgushev (2012). A test for archimedeanity in bivariate copula models. Journal of Multivariate Analysis 110, 121–132. doi: 10.1016/j.jmva.2012.01.026.
Caron, F. and A. Doucet (2008). Sparse Bayesian nonparametric regression. Proceedings of the 25th International Conference on Machine Learning. ACM, 88–95.
Carvaloh, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97, 465–480. doi: 10.1093/biomet/asq017.
Castillo, I. and v. d. V. A (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. The Annals of Statistics 40, 2069–2101. doi: 10.1214/12-aos1029.
Chen, X. and Y. Fan (2005). Pseudo-Likelihood Ratio Tests for Semiparametric Multivariate Copula Model Selection. The Canadian Journal of Statistics / La Revue Canadienne de Statistique 33, 389–414. doi: 10.1002/cjs.5540330306.
Chen, Z., J. Fan, and K. Wang (2023). Multivariate Gaussian processes: Definitions, examples and applications. Metron 81, 181–191. doi: 10.1007/s40300-023-00238-3.
Clyde, M. and E. I. George (2004). Model uncertainty. Statistical Science 19, 81–94.
Czado, C. and T. Nagler (2022). Vine copula based modeling. Annual Review of Statistics and Its Application 9, 453–477. doi: 10.1146/annurev-statistics-040220-101153.
Delgado, M. A. and W. G. Manteiga (2001). Significance testing in nonparametric regression based on the bootstrap. The Annals of Statistics 29. doi: 10.1214/aos/1013203462.
Dette, H. (1999). A consistent test for the functional form of a regression based on a difference of variance estimators. The Annals of Statistics 27, 1012–1040. doi: 10.1214/aos/1018031266.
Dette, H., M. Guhlich, and N. Neumeyer (2015). Testing for additivity in nonparametric quantile regression. Annals of the Institute of Statistical Mathematics 67, 437–477. doi: 10.1007/s10463-014-0461-1.
Dette, H., R. Van Hecke, and S. Volgushev (2014). Some Comments on Copula-Based Regression. Journal of the American Statistical Association 109, 1319–1324. doi: 10.1080/01621459.2014.916577.
Fermanian, J.-D. (2005). Goodness-of-fit tests for copulas. Journal of Multivariate Analysis 95, 119–152. doi: 10.1016/j.jmva.2004.07.004.
Fermanian, J.-D. (2012). An overview of the goodness-of-fit test problem for copulas. SSRN Electronic Journal. doi: 10.2139/ssrn.2177921.
Frazier, D. T., R. Kohn, C. Drovandi, and D. Gunawan (2023). Reliable Bayesian Inference in Misspecified Models.
Frühwirth-Schnatter, S. and H. Wagner (2010). Stochastic model specification search for Gaussian and partial non-Gaussian state space models. Journal of Econometrics 154, 85–100. doi: 10.1016/j.jeconom.2009.07.003.
Fuentes, M. (2006). Testing for separability of spatial-temporal covariance functions. Journal of Statistical Planning and Inference 136, 447–466. doi: 10.1016/j.jspi.2004.07.004.
Fuglstad, G.-A., D. Simpson, F. Lindgren, and H. Rue (2019). Constructing priors that penalize the complexity of gaussian random fields. Journal of the American Statistical Association 114, 445–452. doi: 10.1080/01621459.2017.1415907.
Genest, C., B. Rémillard, and D. Beaudoin (2009). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics 44, 199–213. doi: 10.1016/j.insmatheco.2007.10.005.
Gneiting, T., F. Balabdaoui, and A. E. Raftery (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B: Statistical Methodology 69, 243–268. doi: 10.1111/j.1467-9868.2007.00587.x.
Goodfellow, I., Y. Bengio, and A. Courville (2016). Deep learning. MIT press.
Goto, Y., T. Kley, R. V. Hecke, S. Volgushev, et al. (2022). The integrated copula spectrum. The Annals of Statistics 50, 3563–3591. doi: 10.1214/22-AOS2240.
Gozalo, P. L. (1993). A Consistent Model Specification Test for Nonparametric Estimation of Regression Function Models. Econometric Theory 9, 451–477. doi: 10.1017/S0266466600007763.
Gräler, B. and E. J. Pebesma (2011). Modelling dependence in space and time with vine copulas. url: https://api.semanticscholar.org/CorpusID:211538580.
Griffin, J. E. and P. J. Brown (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis 5, 171–188. doi: 10.1214/10-ba507.
Grønneberg, S. and N. L. Hjort (2014). The copula information criteria. Scandinavian Journal of Statistics 41, 436–459. doi: 10.1111/sjos.12042.
Gunawan, D., R. Kohn, and D. Nott (2023). Flexible variational Bayes based on a copula of a mixture of normals.
Hoque, M. E., E. F. Acar, and M. Torabi (2023). A time-heterogeneous D-vine copula model for unbalanced and unequally spaced longitudinal data. Biometrics 79, 734–746. doi: 10.1111/biom.13652.
Jaeger, H. (2007). Echo state network. Scholarpedia 2, 2330. doi: 10.4249/scholarpedia.2330.
Jeong, K., W. K. Härdle, and S. Song (2012). A CONSISTENT NONPARAMETRIC TEST FOR CAUSALITY IN QUANTILE. Econometric Theory 28, 861–887. doi: 10.1017/S0266466611000685.
Jiang, H., A. Schörgendorfer, Y. Hwang, and Y. Amemiya (2015). A practical approach to spatio-temporal analysis. Statistica Sinica 25, 369–384. doi: 10.5705/ss.2013.262w.
Klein, N. (2023). Distributional Regression for Data Analysis.
Klein, N., M. Carlan, T. Kneib, S. Lang, et al. (2021). Bayesian effect selection in structured additive distributional regression models. Bayesian Analysis 16. doi: 10.1214/20-BA1214.
Klein, N. and T. Kneib (2016). Simultaneous inference in structured additive conditional copula regression models: a unifying Bayesian approach. Statistics and Computing 26, 841–860. doi: 10.1007/s11222-015-9573-6.
Klein, N., T. Kneib, and S. Lang (2015a). Bayesian generalized additive models for location, scale, and shape for zero-inflated and overdispersed count data. Journal of the American Statistical Association 110, 405–419. doi: 10.1080/01621459.2014.912955.
Klein, N., T. Kneib, S. Lang, and A. Sohn (2015b). Bayesian structured additive distributional regression with an application to regional income inequality in Germany. The Annals of Applied Statistics 9. doi: 10.1214/15-AOAS823.
Klein, N. and M. S. Smith (2019). Implicit copulas from Bayesian regularized regression smoothers. Bayesian Analysis 14, 1143–1171. doi: 10.1214/18-BA1138.
Klein, N. and M. S. Smith (2021). Bayesian variable selection for non-Gaussian responses: A Marginally-calibrated copula approach. Biometrics 77, 809–823. doi: 10.1111/biom.13355.
Klein, N., M. S. Smith, and D. J. Nott (2023). Deep Distributional Time Series Models and the Probabilistic Forecasting of Intraday Electricity Prices. Journal of Applied Econometrics 38, 493–511. doi: 10.1002/jae.2959.
Ko, V. and N. L. Hjort (2019). Copula information criterion for model selection with two-stage maximum likelihood estimation. Econometrics and Statistics, 167–180. doi: 10.1016/j.ecosta.2019.01.001.
Ko, V., N. L. Hjort, and I. Hobaek Haff (2019). Focused information criteria for copulas. Scandinavian Journal of Statistics 46, 1117–1140. doi: 10.1111/sjos.12387.
Kobyzev, I., S. J. Prince, and M. A. Brubaker (2021). Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3964–3979. doi: 10.1109/TPAMI.2020.2992934.
Kowal, D. R., D. S. Matteson, and D. Ruppert (2019). Dynamic shrinkage processes. Journal of the Royal Statistical Society, Series B 81, 781–804. doi: 10.1111/rssb.12325.
Krupskii, P. and M. G. Genton (2019). A copula model for non-Gaussian multivariate spatial data. Journal of Multivariate Analysis 169, 264–277. doi: 10.1016/j.jmva.2018.09.007.
Krupskii, P., R. Huser, and M. G. Genton (2018). Factor copula models for replicated spatial data. Journal of the American Statistical Association 113, 467–479. doi: 10.1080/01621459.2016.1261712.
Krupskii, P. and H. Joe (2013). Factor copula models for multivariate data. Journal of Multivariate Analysis 120, 85–101. doi: 10.1016/j.jmva.2013.05.001.
Krupskii, P., B. R. Nasri, and B. N. Remillard (2023). On factor copula-based mixed regression models.
Kurowicka, D. and H. Joe (2010). Dependence modeling — Vine copula handbook. World Scientific. doi: 10.1142/7699.
Kutzker, T., N. Klein, and D. Wied (2021). Flexible specification testing in quantile regression Models.
Lavergne, P., S. Maistre, and V. Patilea (2015). A significance test for covariates in nonparametric regression. Electronic Journal of Statistics, 643–678. doi: 10.1214/15-EJS1005.
Lindgren, F., H. Rue, and J. Lindström (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. Journal of the Royal Statistical Society Series B 73, 423–498. doi: 10.1111/j.1467-9868.2011.00777.x.
Liu, G., W. Long, B. Yang, and Z. Cai (2022). Semiparametric estimation and model selection for conditional mixture copula models. Scandinavian Journal of Statistics 49, 287–330. doi: 10.1111/sjos.12514.
Liu, G., W. Long, X. Zhang, and Q. Li (2019). Detecting financial data dependence structure by averaging mixture copulas. Econometric Theory 35, 777–815. doi: 10.1017/S0266466618000270.
Loader, C. (1999). Local regression and likelihood. Statistics and computing. New York and Heidelberg: Springer. doi: 10.1007/b98858.
Loaiza-Maya, R., M. S. Smith, D. J. Nott, and P. J. Danaher (2022). Fast and accurate variational inference for models with many latent variables. Journal of Econometrics 28, 523–539. doi: 10.1016/j.jeconom.2021.05.002.
Lobato, I. N. (2001). Testing that a dependent process is uncorrelated. Journal of the American Statistical Association 96, 1066–1076. doi: 10.1198/016214501753208726.
Lukoševičius, M. and H. Jaeger (2009). Reservoir computing approaches to recurrent neural network training. Computer Science Review 3, 127–149. doi: 10.1016/j.cosrev.2009.03.005.
Marques, I., N. Klein, and T. Kneib (2020). Non-stationary spatial regression for modelling monthly precipitation in Germany. Spatial Statistics 40, 100386. doi: 10.1016/j.spasta.2019.100386.
Marra, G. and R. Radice (2017). Bivariate copula additive models for location, scale and shape. Computational Statistics & Data Analysis 112, 99–113. doi: 10.1016/j.csda.2017.03.004.
Marra, G. and R. Radice (2020). Copula Link-Based Additive Models for Right-Censored Event Time Data. Journal of the American Statistical Association 115, 886–895. doi: 10.1080/01621459.2019.1593178.
McDermott, P. L. and C. K. Wikle (2017). An ensemble quadratic echo state network for non-linear spatio-temporal forecasting. Stat 6, 315–330. doi: 10.1002/sta4.160.
Mitchell, M. W., M. G. Genton, and M. L. Gumpertz (2005). Testing for separability of space–time covariances. Environmetrics 16, 819–831. doi: 10.1002/env.737.
Murray, J. S., D. B. Dunson, L. Carin, and J. E. Lucas (2013). Bayesian Gaussian copula factor models for mixed data. Journal of the American Statistical Association 108, 656–665. doi: 10.1080/01621459.2012.762328.
Nelsen, R. B. (2006). An Introduction to Copulas. Springer Series in Statistics. Springer. doi: 10.1007/0-387-28678-0.
Noh, H., A. E. Ghouch, and T. Bouezmarni (2013). Copula-based regression estimation and inference. Journal of the American Statistical Association 108, 676–688. doi: 10.1080/01621459.2013.783842.
Noh, H., A. E. Ghouch, and I. V. Keilegom (2015). Semiparametric conditional quantile estimation through copula-based multivariate models. Journal of Business & Economic Statistics 33, 167–178. doi: 10.1080/07350015.2014.926171.
O’Hara, R. B. and M. J. Sillanpää (2009). A review of Bayesian variable selection methods: What, how and which. Bayesian Analysis 4, 85–117. doi: 10.1214/09-BA403.
Oh, D. H. and A. J. Patton (2017). Modeling dependence in high dimensions with factor copulas. Journal of Business & Economic Statistics 35, 139–154. doi: 10.1080/07350015.2015.1062384.
Ong, V. M., D. Nott, and M. Smith (2018). Gaussian variational approximation with a factor covariance structure. Journal of Computational and Graphical Statistics 27, 465–478. doi: 10.1080/10618600.2017.1390472.
Ormerod, J. T. and M. P. Wand (2010). Explaining Variational Approximations. American Statistician 64, 140–153. doi: 10.1198/tast.2010.09058.
Racine, J. (1997). FEASIBLE CROSS-VALIDATORY MODEL SELECTION FOR GENERAL STATIONARY PROCESSES. Journal of Applied Econometrics 12, 169–179. doi: 10.1002/(SICI)1099-1255(199703)12:2<169::AID-JAE426>3.0.CO;2-P.
Riebl, H., N. Klein, and T. Kneib (2023). Modelling intra-annual tree stem growth with a distributional regression approach for Gaussian process responses. Journal of the Royal Statistical Society Series C: Applied Statistics 72, 414–433. doi: 10.1093/jrsssc/qlad015.
Rigby, R. A. and D. M. Stasinopoulos (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society. Series C (Applied Statistics) 54, 507–554. doi: 10.1111/j.1467-9876.2005.00510.x.
Ročková, V. (2018). Bayesian estimation of sparse signals with a continuous spike and slab prior. The Annals of Statistics 46, 401–437. doi: 10.1214/17-aos1554.
Ročková, V. and E. I. George (2018). The Spike-and-Slab LASSO. Journal of the American Statistical Association 113, 431–444. doi: 10.1080/01621459.2016.1260469.
Rue, H. and L. Held (2005). Gaussian Markov Random Fields: Theory and Applications. New York/Boca Raton: CRC press. doi: 10.1201/9780203492024.
Rügamer, D., C. Kolb, and N. Klein (2023). Semi-structured distributional regression. The American Statistician. doi: 10.1080/00031305.2022.2164054.
Sabeti, A., M. Wei, and R. V. Craiu (2014). Additive models for conditional copulas. Stat 3, 300–312. doi: 10.1002/sta4.64.
Segers, J. (2012). Asymptotics of empirical copula processes under non-restrictive smoothness assumptions. Bernoulli 18. doi: 10.3150/11-BEJ387.
Sick, B., T. Hothorn, and O. Dürr (2021). Deep transformation models: Tackling complex regression problems with neural network based transformation models. 25th International Conference on Pattern Recognition, 2476–2481. doi: 10.1109/ICPR48806.2021.9413177.
Smith, M. S. and N. Klein (2021). Bayesian inference for regression copulas. Journal of Business & Economic Statistics 39, 712–728. doi: 10.1080/07350015.2020.1721295.
Smith, M. S. and R. Loaiza-Maya (2022). Implicit copula variational inference. Journal of Computational and Graphical Statistics, 1–13. doi: 10.1080/10618600.2022.2119987.
Smith, M. S., R. Loaiza-Maya, and D. J. Nott (2020). High-dimensional copula variational approximation through transformation. Journal of Computational and Graphical Statistics 29, 729–743. doi: 10.1080/10618600.2020.1740097.
Strothmann, C., H. Dette, and K. F. Siburg (2023). Rearranged dependence measures.
Sun, T. and Y. Ding (2021). Copula-based semiparametric regression method for bivariate data under general interval censoring. Biostatistics (Oxford, England) 22, 315–330. doi: 10.1093/biostatistics/kxz032.
Tadesse, M. and M. Vannucci (2021). Handbook of Bayesian Variable Selection. 1st ed. Chapman and Hall/CRC. doi: 10.1201/9781003089018.
Tang, Y., H. J. Wang, Y. Sun, and A. S. Hering (2019). Copula-based semiparametric models for spatiotemporal data. Biometrics 75, 1156–1167. doi: 10.1111/biom.13066.
Veraverbeke, N., M. Omelka, and I. Gijbels (2011). Estimation of a conditional copula and association measures. Scandinavian Journal of Statistics 38, 766–780. doi: 10.1111/j.1467-9469.2011.00744.x.
Verhoijsen, A. and P. Krupskii (2022). Fast inference methods for high-dimensional factor copulas. Dependence Modeling 10, 270–289. doi: 10.1515/demo-2022-0117.
Vogt, M. and H. Dette (2015). Detecting gradual changes in locally stationary processes. The Annals of Statistics 43, 713–740. doi: 10.1214/14-AOS1297.
Volgushev, S., M. Birke, H. Dette, and N. Neumeyer (2013). Significance testing in quantile regression. Electronic Journal of Statistics 7. doi: 10.1214/12-EJS765.
Wikle, C. K., A. Zammit Mangion, and N. A. C. Cressie (2019). Spatio-temporal statistics with R. Chapman & Hall/CRC: The R Series. Boca Raton London New York: CRC Press Taylor & Francis Group.
Wikle, C. K. and A. Zammit-Mangion (2023). Statistical deep learning for spatial and spatiotemporal data. Annual Review of Statistics and Its Application 10, 247–270. doi: 10.1146/annurev-statistics-033021-112628.
Wood, S. N. (2017). Generalized additive models: An introduction with R. Second edition. Texts in statistical science. CRC Press Taylor & Francis Group. doi: 10.1201/9781315370279.
Yu, K., B. U. Park, and E. Mammen (2008). Smooth backfitting in generalized additive models. The Annals of Statistics 36, 228–260. doi: 10.1214/009053607000000596.
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association 57, 348. doi: 10.2307/2281644.
Zhang, D., A. Khalili, and M. Asgharian (2022). Post-model-selection inference in linear regression models: An integrated review. Statistics Surveys 16, 86–136. doi: 10.1214/22-SS135.