Introduction to ungroup

Abstract

The ungroup R package introduces a versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi, Gampe, and Eilers (2015).

Package Structure

The package has two top level functions pclm and pclm2D, two auxiliary functions (control.pclm and control.pclm2D), several generic functions (plot, summary, fitted, residuals). A dataset (ungroup.data) is provided as well for testing purposes.

All functions are documented in the standard way, which means that once you load the package using library(ungroup) you can just type for example ?pclm to see the help file.

# Load the package
library(ungroup)

Usage

Acknowledgment

We thank Paul H.C. Eilers who provided insight and expertise that greatly supported the creation of this R package; and Catalina Torres and Tim Riffe for testing and offering feedback on the early versions of the software.

The authors are also grateful to the following institutions for their support:

  • University of Southern Denmark;
  • Max-Planck Institute for Demographic Research;
  • SCOR Corporate Foundation for Science.

References

Currie, Iain D, Maria Durban, and Paul HC Eilers. 2004. “Smoothing and Forecasting Mortality Rates.” Statistical Modelling 4 (4): 279–98.
Eilers, Paul HC. 2007. “Ill-Posed Problems with Counts, the Composite Link Model and Penalized Likelihood.” Statistical Modelling 7 (3): 239–54. https://doi.org/10.1177/1471082X0700700302.
Hastie, Trevor J, and Robert J Tibshirani. 1990. “Generalized Additive Models.” Monographs on Statistics and Applied Probability 43.
Rizzi, Silvia, Jutta Gampe, and Paul H. C. Eilers. 2015. “Efficient Estimation of Smooth Distributions from Coarsely Grouped Data.” American Journal of Epidemiology 182 (2): 138–47. https://doi.org/10.1093/aje/kwv020.
Thompson, R, and RJ Baker. 1981. “Composite Link Functions in Generalized Linear Models.” Applied Statistics, 125–31.