# Quetelet seminars 2015 and before

## Quetelet seminars 2015 and before

- Statistical Considerations in Assessing Quality of Care

Sharon-Lise Normand

December 18, 2015 - On the physical interpretation of a meta-analysis in the presence of heterogeneity and bias: from clinical trials to Mendelian randomisation

Jack Bowden

December 14, 2015 - A cylindrical model for wind direction and velocity

Dr. Toshihiro Abe

November 5, 2015 - Semiparametric methods for multiphase response-selective sampling schemes

Dr. Gustavo Amorim

July 3, 2015 - Choosing a mixed effect model for subject-specific inference

Dr. Rosanna Overholser

April 24, 2015 - Ordered regressions with applications

Dr. Ori Davidov

March 31, 2015 - Is sitting the new smoking?

dr. Katrien De Cocker and prof. dr. Els Goetghebeur

March 11, 2015 - Adolphe Quetelet & the Graphical Depiction of Categorical Data

Eric Beh

February 6, 2015 - Instrumental variable analysis with no valid instruments

Stephen Burgess

January 26, 2015 - Information greedy bootstrap inference on random networks

Vyacheslav Lyubchich

December 17, 2014 - How to learn from a lot: Empirical Bayes in Genomics

Mark van de Wiel

December 11, 2014 - The Dantzig Selector for Censored Linear Regression Models: Identifying Predictive Genes for Myeloma Disease Progression

Yi Li

January 8, 2010 - Testing for association between a genetic marker and disease status using family data

Gudrun Jonasdottir

December 8, 2006 - Sensitivity Analysis after Multiple Imputation under Missing At Random -A Weighting Approach

James Carpenter

May 4, 2006 - Developing an E-course for future students of the Master in Statistical Data Analysis

Fanghong Zhang

November 9, 2005 - Spatial Clustering Detection for Censored Outcomes: A Cumulative Martingale Residual Approach

Yi Li

July 4, 2005 - A robust fit for Generalized Additive Models

Matias Salibian-Barrera

May 23, 2005 - Graphical Diagnostics for Lack-of-Fit in Regression Models

Ellen Deschepper

4 March 2005 - Comparison of two structural modeling approaches to estimate the effect of Hormone Therapy

Krista Fischer

11 February 2005 - A Bayesian approach to jointly estimate center and treatment by center heterogeneity in a proportional hazards model

Catherine Legrand

11 February 2005 - Probability Models for Nonnegative Random Variables, with Application in Survival Analysis and Reliability

Ingram Olkin

31 January 2005 - Analysis of microarray data in a dose-response setting

Ziv Shkedy

27 January 2005 - Graphical Diagnostics for Lack-of-Fit in Regression Models

Ellen Deschepper

17 December 2004 - Applying Methods from Machine Learning to Insurance Problems

Andreas Christmann

22 November 2004 - Analysis of Developmental Toxicity Data

Christel Faes

12 November 2005 - Fast bootstrap methods for robust estimators

Gert Willems

22 October 2004 - A history of smooth tests of goodness of fit

J.C.W. Rayner

17 September 2004 - Robust Variable Selection

Ruben Zamar

14 May 2004 - Screening for Potentially Informative Dropout in Longitudinal Studies with Binary Outcome

Tom Loeys

7 May 2004 - Missing Data Methods for Structural Equation Models

Peter M. Bentler

14 April 2004 - The valuation of Asian options for market models of exponential Levy type

Hansjörg Albrecher

2 April 2004 - Beyond ignorance: evaluating the plausibility of possible parameter estimates and inferences when data are missing

James Carpenter

6 February 2003 - Developments in Longitudinal Studies

Geert Verbeke

5 December 2003 - Methodology for genetic analyses of twins and families

Sylvie Goetgeluk

31 October 2003 - Functionals of clusters of extremes

Johan Segers

24 October 2003 - The Shared Frailty Model

Paul Janssen

12 September 2003 - Semiparametric regression for repeated outcomes with nonignorable intermittent nonresponse

S. Vansteelandt, A. Rotnitzky and J. Robins

18 June 2003 - Multiscale triangulations and second generation wavelets in nonlinear smoothing of scattered data

Maarten Jansen

13 June 2003 - Mapping Soil Texture at a Regional Scale using Pedometrical Techniques

Marc Van Meirvennne

06 June 2003 - Model averaging, post-model selection inference and the focussed information criterion

Gerda Claeskens

27 May 2003 - Inference on Survival Data with Covariate Measurement Error-An Imputation-based Approach

Yi Li

20 May 2003 - Reporting and Statistics using SAS Enterprise Guide

Saar De Zutter

9 May 2003 - The implementation of cancer registration in Belgium: a never ending story ?

Joost Weyler

25 April 2003 - The analysis of QoL data

Kristel Van Steen

11 April 2003 - The S-Language

Saar De Zutter

14 March 2003 - Score tests for detecting linkage to quantitative traits

Hein Putter

28 February 2003 - Modelling family data: from segregation to linkage analysis

Jeanine Houwing-Duistermaat

24 January 2003 - Introduction to Robust Statistics

Stefan Van Aelst

17 January 2003 - The challenge of patient choice and non-adherence to treatment in RCTS of counselling and psychotherapy

Graham Dunn

10 January 2003 - Structural accelerated failure time models and recurrent events

An Vandebosch

13 December 2002 - Het schatten van parameters in rentevoetmodellen

Ella Roelant

6 December 2002 - Testability of the Coarsening At Random (CAR) assumption

Eric Cator

18 November 2002 - Causal graphs in epidemiology

Stijn Vansteelandt

6 November 2002 - A frailty model for HIV infection in mobile and non-mobile cohorts from a rural district of South Africa

Khangelani Zuma

20 September 2002 - Improving response prediction in direct marketing by optimizing for specific mailing depths

Van den Poel, A. Prinzie & P. Van Kenhove

20 September 2002

## Statistical Considerations in Assessing Quality of Care

Sharon-Lise Normand

Harvard University

Friday , December 18, 2015 , 15h30 - ?

A2 , Sterre Campus Building S9, 9000 Gent

### Abstract:

In this talk, I describe three prototypical health policy problems with emphasis on the sta- tistical challenges in drawing valid conclusions. The first problem involves identification and quantification of racial/ethnic disparities and geographic disparities on the basis of multi- ple quality indicators. The data are observational repeated cross-sectional panels spanning 7 years. The second problem involves determining whether drug reformulations (permit- ted in the U.S. by the 1984 Hatch-Waxman Act) of existing products benefit patients. We examine observational claims data to assess the effectiveness of antidepressant reformula- tions compared to their original molecules. The third problem relates to the diffusion of new medical technology and involves summarizing drug prescribing behaviors for three new therapeutically-similar drugs using dispensing information for nearly 17,000 U.S. physicians over a decade. Features of the data include time-varying drug choice sets due to different launch dates, semi-continuous response data, and multivariate outcomes.

This work is funded, in part, by grants U01-MH103018 and R01-MH093359, both from the National Institute Mental Health, and is joint with Haiden Huskamp (Harvard Medical School), Julie Donohue (University of Pittsburgh), and Marcela Horvitz-Lennon (The Rand Corporation).

## On the physical interpretation of a meta-analysis in the presence of heterogeneity and bias: from clinical trials to Mendelian randomisation

Jack Bowden

University of Bristol, U.K.

Monday , December 14, 2015 , 12h00 - 13h00

V2 , Sterre Campus Building S9, 9000 Gent

### Abstract:

The funnel plot is a graphical visualisation of summary data estimates from a meta-analysis, and is a useful tool for detecting departures from the standard modelling assumptions. Although perhaps not widely appreciated, a simple extension of the funnel plot can help to facilitate an intuitive interpretation of the mathematics underlying a meta-analysis at a more fundamental level, by equating it to determining the centre of mass of a physical system. We used this analogy, with some success, to explain the concepts of weighing evidence and of biased evidence to a young audience at the Cambridge Science Festival, without recourse to precise definitions or statistical formulae. In this talk I aim to formalise this analogy at a more technical level using the estimating equation framework: firstly, to help elucidate some of the basic statistical models employed in a meta-analysis and secondly, to forge new connections between bias adjustment in the evidence synthesis and causal inference literatures.

## A cylindrical model for wind direction and velocity

Dr. Toshihiro Abe

Nanzan University

Thursday , November 5, 2015 , 13h00 - ?

V2 , Sterre Campus Building S9, 9000 Gent

### Abstract:

In this talk, we propose length dependent cylindrical models for wind data. In many cases, the distribution of the wind direction around sea coast has two modes. For this purpose, we review some circular distributions generated by perturbation of the symmetric circular distributions. Then we combine the existing methods, cosine-perturbation and sine-skewing, to obtain a good fitting model for the wind direction. Then we make use of the recently introduced method by Abe & Ley to obtain a new class of cylindrical models. The mathematical properties of the model will be also investigated. As an illustrative example, we consider the parameter estimation of our model for the wind directions and speeds at Namie district.

*This work is a joint work with Christophe Ley and Takayuki Shiohama.

## Semiparametric methods for multiphase response-selective sampling schemes

Dr. Gustavo Amorim

Ghent University, Belgium

Friday , July 3, 2015 , 16h00 - ?

E2.009 , Campus Coupure, Building E, 9000 Gent

### Abstract:

Response-selective multiphase sampling schemes, where full information has been observed for only a sample of the finite population, are of common occurrence in most medical research. It may arise due to design for efficiency or cost-reduction purposes, such as in case-control studies, or by happenstance, when units fail to respond. In this talk we present a semiparametric method, which is a direct extension of the well-known conditional likelihood method. We discuss its performance through simulations and explore its asymptotic efficiency by numerically deriving the semiparametric efficiency bound.

After the seminar, the Center for Statistics is happy to invite you for a drink near the campus.

## Choosing a mixed effect model for subject-specific inference

Dr. Rosanna Overholser

Ghent University, Belgium

Friday , April 24, 2015 , 16h15 - ?

V1 , Campus De Sterre, Krijgslaan 281, building S9, 9000 Gent

### Abstract:

Clustered data, such as repeated measurements on a set of subjects, are often modeled by a "mixed effect" model. This type of model contains both population level effects ("fixed") and subject level effects ("random"). Inference may be desired on either a population or a subject level. A conditional Akaike Information (cAI) was proposed by Vaida and Blanchard in 2005 for the purpose of subject-specific inference and a corresponding model selection criterion was developed under linear mixed models. I will discuss estimation of cAI under general and generalized linear mixed models. Estimation of cAI via exact calculations is not available outside linear mixed models so asymptotic approximations and a bootstrap are used.

References:

- Vaida, Florin, and Suzette Blanchard. "Conditional Akaike information for mixed-effects models." Biometrika 92.2 (2005): 351-370.
- Donohue, M. C., Overholser, R., Xu, R., & Vaida, F. (2011). Conditional Akaike information under generalized linear and proportional hazards mixed models. Biometrika, 98(3), 685-700.
- Overholser, Rosanna, and Ronghui Xu. "Effective degrees of freedom and its application to conditional AIC for linear mixed-effects models with correlated error structures." Journal of multivariate analysis 132 (2014): 160-170.

## Ordered regressions with applications

Dr. Ori Davidov

University of Haifa, Haifa Israel

Tuesday , March 31, 2015 , 15h00 - ?

A0.030 , Campus Coupure, Building A

### Abstract:

There are often situations where two or more regression functions are ordered over a range of covariate values. We develop efficient constrained estimation and testing procedures for such models. Specifically, necessary and sufficient conditions for ordering generalized linear regressions are given and shown to unify previous results obtained for simple linear regression, polynomial regression, and in the analysis of covariance models. Estimation and testing procedures are developed. We show that estimating the parameters of ordered linear regressions requires either quadratic programming or semi-infinite programming, depending on the shape of the covariate space. A distance type test for order is proposed. Simulations demonstrate that the proposed methodology improves the mean-square error and power compared with the usual, unconstrained, estimation and testing procedures. Improvements are often substantial. The methodology is extended to order generalized linear models where convex semi-infinite programming plays a role. The methodology is applied to a longitudinal hearing loss study.

## Is sitting the new smoking?

dr. Katrien De Cocker and prof. dr. Els Goetghebeur

Wednesday , March 11, 2015 , 12h00 - 13h00

Museum of the History of Science , Building S30, Campus De Sterre, Krijgslaan 281, 9000 Gent, Belgium

### Abstract:

Join a discussion by dr. Katrien De Cocker and prof. dr. Els Goetghebeur of how the essential tools of study design uncover the hidden risks of your own desk job, from heart disease and diabetes to genetic mutations, and how you can apply these tools to boost your own research.

More info about the seminar:

## Adolphe Quetelet & the Graphical Depiction of Categorical Data

Eric Beh

Friday , February 6, 2015 , 13h00 - ?

Room E2.009 , Building E, Coupure Campus, Coupure links 653, 9000 Gent, Belgium

### Abstract:

Typically, a test of independence is conducted to assess the statistical significance of the association between categorical variables that are cross-classified to form a contingency table. When it is concluded that an association exists between the variables, little attention is given to the nature of the association. One strategy that can be considered for examining the structure of the association between the variables is to consider the data visualisation technique of correspondence analysis. This presentation will provide a brief overview of a variety of aspects of the technique. We shall discuss issues concerning the analysis of two categorical variables and extend our discussion to the analysis of multiple categorical variables. We shall be motivating our overview by considering the analysis of a selection of tables found in Adolphe Quetelet's "Sur L'Homme et le Developpement de Ses Facultes".

Coffee and soft-drinks will be available from 12h30 in front of room E2.009 for those who want to bring their sandwich with them.

## Instrumental variable analysis with no valid instruments

Stephen Burgess

Department of Public Health and Primary Care, University of Cambridge, UK

Monday , January 26, 2015 , 12h00 - ?

Room V2 , Krijgslaan 281-S9, 2nd floor, 9000 Gent, Belgium

### Abstract:

Dr. Stephen Burgess is Sir Henry Welcome Post Doctoral Fellow at the University of Cambridge. His main area of research is causal inference and specifically methods for Mendelian randomization. This has so far incorporated work on meta-analysis, evidence synthesis, and missing data. He has a specific involvement in problems of inference with weak instruments. For more information, see http://www.phpc.cam.ac.uk/people/ceu-group/ceu-research-staff/stephen-burgess/

## Information greedy bootstrap inference on random networks

Vyacheslav Lyubchich

Department of Statistics & Actuarial Science at the University of Waterloo (Waterloo, Canada)

Wednesday , December 17, 2014 , 11h00 - ?

Classroom 2.2 , Site Hoveniersberg, 2nd floor, 9000 Gent, Belgium

### Abstract:

We propose a new nonparametric "patchwork" resampling approach to network inference based on the adaptation of "blocking" argument, developed for bootstrapping of time series and re-tiling for spatial data. We focus on uncertainty quantification for network mean degree, under the assumption that both network degree distribution and network order are unknown. We develop a computationally efficient and data-driven cross-validation algorithm for selecting an optimal "patch" size. We apply the new "patchwork" bootstrap procedure to simulated networks and data on airline alliances and Wikipedia activity. This is a joint work with Y. R. Gel and L. L. Ramirez Ramirez.

## How to learn from a lot: Empirical Bayes in Genomics

Mark van de Wiel

VU university medical center and VU university

Thursday , December 11, 2014 , 15h00 - ?

room Sint-Lucas , Monasterium PoortAckere, , Oude Houtlei 56, 9000 Gent, Belgium

### Abstract:

The high-dimensional character of genomics data generally forces statistical inference methods to apply some form of penalization, e.g. multiple testing, penalized regression or sparse gene networks. The other side of the coin, however, is that the dimension of the variable space may also be used to learn across variables (like genes, tags, methylation probes, etc). Empirical Bayes is a powerful principle to do so. In both the Bayesian and frequentist paradigms it comes down to estimation of the a priori distribution of parameter(s) from the data. We shortly review some well-known statistical methods that use empirical Bayes to analyze genomics data. We believe, however, that the principle is often not used at its full strength. We illustrate the flexibility and versatility of the principle in three settings: 1) Bayesian inference for differential expression from count data (e.g. RNAseq), 2) prediction of binary response, and 3) network reconstruction.

For 1) we develop a novel algorithm, ShrinkBayes, for the efficient simultaneous estimation of multiple priors, allowing joint shrinkage of multiple parameters in differential gene expression models. This can be attractive when sample sizes are small or when many nuisance variables like batch effects are present. For 2) we demonstrate how auxiliary information in the form of 'co-data', e.g. p-values from an external study or genomic annotation, can be used to improve prediction of binary response, like tumor recurrence. We derive empirical Bayes estimates of penalties of groups of variables in a classical logistic ridge regression setting, and show that multiple source of co-data may be used. Finally, for 3) we combine empirical Bayes with computationally efficient variational Bayes approximations of posteriors for the purpose of gene network reconstruction by the use structural equation models. These models regress each gene on all others, and hence this setting can be regarded as a combination of 1) and 2).

## The Dantzig Selector for Censored Linear Regression Models: Identifying Predictive Genes for Myeloma Disease Progression

Yi Li

Dana-Farber Cancer Institute, Harvard School of Public Health

Friday , January 8, 2010 , 15h00 - ?

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

The Dantzig variable selector has recently emerged as a powerful tool for fitting regularized regression models. A key advantage is that it does not pertain to a particular likelihood or objective function, as opposed to the existing penalized likelihood methods, and hence has the potential for wide applications. To our knowledge, almost all the Dantzig selector work has been performed with fully observed response variables. This talk introduces a new class of adaptive Dantzig variable selectors for linear regression models when the response variable is subject to right censoring. This is motivated by a clinical study of detecting predictive genes for myeloma patients' event-free survival, which is subject to right censoring. We establish the theoretical properties of our procedures, including consistency in model selection (i.e. the right subset model will be identified with a probability tending to 1) and the oracle property of the estimation (i.e. the asymptotic distribution of the estimates is the same as that when the true subset model is known a priori). The practical utility of the proposed adaptive Dantzig selectors is verified via extensive simulations. We apply the new method to the aforementioned myeloma clinical trial and identify important predictive genes for patients' event free survival.

## Testing for association between a genetic marker and disease status using family data

Gudrun Jonasdottir

Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm

Friday , December 8, 2006 , 14h30 - 15h30

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Diseases with a genetic component tend to "cluster"dels in genetic association studies and use of offsets in FBAT and the proposed score test.

## Sensitivity Analysis after Multiple Imputation under Missing At Random -A Weighting Approach

James Carpenter

The London School of Hygiene and Tropical Medicine

Thursday , May 4, 2006 , 16h30 - 17h30

V3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Imputation under MAR. This provides ball-park estimates of the results of full NMAR modelling, indicating the extent to which it is necessary and providing a check on its results. We illustrate our approach with a small simulation study, and the analysis of data from a large multi-centre clinical trial.

## Developing an E-course for future students of the Master in Statistical Data Analysis

Fanghong Zhang

Ghent University

Wednesday , November 9, 2005 , 13h30 - 14h30

A3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

abstract09nov05.pdf

## Spatial Clustering Detection for Censored Outcomes: A Cumulative Martingale Residual Approach

Yi Li

Harvard University and Dana-Farber Cancer Insitute, U.S.A.

Monday , July 4, 2005 , 14h00 - 15h00

V1 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Numerous methods that have been proposed to test for spatial clustering, particularly for binary or continuous outcomes. However none has been proposed for outcomes which are subject to censoring. This project provides an extension of the spatial scan statistic (Kulldorff, 1997) for data with failure time outcomes using the log rank test statistic. It further proposes an extension of the cumulative geographic residual method that utilizes the model diagnostic techniques for censored outcomes. Application of these methods will be illustrated by the Home Allergens and Asthma prospective cohort study analyzed the relationship of environmental exposures with asthma, allergic rhinitis/hayfever, and eczema.

## A robust fit for Generalized Additive Models

Matias Salibian-Barrera

University of British Columbia

Monday , May 23, 2005 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Generalized Additive Models (GAM) (Hastie and Tibshirani, 1986, 1990) are a powerful exploratory tool that is widely used in practice. Unfortunately, popular fitting algorithms for these models (e.g. the General Local Scoring Algorithm (GLSA), Hastie and Tibshirani, 1990) can be highly sensitive to a small proportion of observations that depart from the model.

In this talk I will describe a new robust fit for GAM models. The building blocks of this proposal are robust estimates for Quasi-Likelihood (QL) models (Cantoni and Ronchetti, 2001b; see also Stefanski, Carroll and Ruppert, 1986; and K\"unsch, Stefanski and Carroll, 1989) and the GLSA algorithm (Hastie and. Tibshirani, 1986, 1990). Specifically, we adapt the GLSA algorithm using robust estimating equations to determine appropriate weights that transform the robust QL score equations into re-weighted least squares equations. We then iteratively fit weighted additive models, in the same spirit as GLSA. Bandwidth selection can be done automatically using a robust cross-validation criteria (Ronchetti and Staudte, 1994; Cantoni and Ronchetti, 2001).

This method will be illustrated on real and synthetic data. Simulation results suggest that the fit obtained with this algorithm is able to resist the effect of outliers in a number of different situations and that it also performs well when there are no atypical observations.

## Graphical Diagnostics for Lack-of-Fit in Regression Models

Ellen Deschepper

Ghent University

Friday , 4 March 2005 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

abstract04mar05.pdf

## Comparison of two structural modeling approaches to estimate the effect of Hormone Therapy

Krista Fischer

Tartu University, Estonia

Friday , 11 February 2005 , 14h00 - 15h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available

## A Bayesian approach to jointly estimate center and treatment by center heterogeneity in a proportional hazards model

Catherine Legrand

European Organization for Research and Treatment of Cancer

Friday , 11 February 2005 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

abstract11feb05.pdf

## Probability Models for Nonnegative Random Variables, with Application in Survival Analysis and Reliability

Ingram Olkin

Stanford University

Monday , 31 January 2005 , 17h00 - 18h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Nonnegative random variables arise in a wide variety of applications; in particular in reliability and survival analysis. Whereas for random variables on the whole line the normal distribution plays a distinctive role, for nonnegative random variables there is no distribution as pervasive as the normal distribution with it s foundation in the central limit theorem.

We consider three classes of distributions: nonparametric, semiparametric and parametric families. Examples from each class are logconcave density families, increasing hazard rate families, and the Weibull distribution, respectively. We provide a survey of these three classes of families with an emphasis on the behavior of their hazard rates and on stochastic orderings.

Joint work with Albert Marshall.

## Analysis of microarray data in a dose-response setting

Ziv Shkedy

Limburgs Universitair Centrum

Thursday , 27 January 2005 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

DNA microarrays have been recently used for the purpose of monitoring expression levels of thousands of genes simultaneously, and identifying those genes that are differentially expressed. As a result type I error (the probability for false identification) increase sharply when the number of tested genes gets large. In this talk we focus on a dose-response setting in which cDNA microarrays are available for four dose levels (3 microarrays at each dose level). A gene is differentially expressed if there is a trend (with respect to dose) of the gene intensity. We discuss several approaches to test the null hypothesis of no dose effect versus an order alternative. Resampling based False Discovery Rate (FDR) and resampling Family-Wise Error Rate (FWER) are used for controlling type I error.

## Graphical Diagnostics for Lack-of-Fit in Regression Models

Ellen Deschepper

Ghent University

Friday , 17 December 2004 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

abstract17dec04.pdf

## Applying Methods from Machine Learning to Insurance Problems

Andreas Christmann

University of Dortmund

Monday , 22 November 2004 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

abstract22nov04.pdf

## Analysis of Developmental Toxicity Data

Christel Faes

Limburgs Universitair Centrum

Friday , 12 November 2005 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

abstract12nov04.pdf

## Fast bootstrap methods for robust estimators

Gert Willems

Department of Mathematics and Computer Science, UA

Friday , 22 October 2004 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

There are several situations in regression analysis, or multivariate location/scatter problems, where it is desirable to use robust estimators instead of the classical estimators. The idea of robust estimators is that they are able to resist contamination (e.g. in the form of so-called outliers) present in the dataset. In addition they should preferably be as efficient as possible in case of "clean" data.

Since over 30 years now, a vast literature has come into being on such estimators, but the inference part is often being neglected. That is, often it is not clear how to obtain a reliable estimate for the variability of the estimator, or how to obtain accurate confidence limits for parameters that are being estimated. Mostly, asymptotic variance results are used for this purpose. However, such asymptotic estimates may be inaccurate for small sample sizes and often are clearly inappropriate in situations where robust estimators are recommended.

Resampling methods such as Efron's bootstrap constitute an alternative to the asymptotic approach. Some drawbacks arise though when applying the classical bootstrap to robust estimators. The most serious of these is the computational cost of the bootstrap procedure. Indeed, computing robust estimators often takes time-consuming algorithms and the classical bootstrap method requires the estimator to be recalculated many times. A second aspect of concern is that bootstrap inference can easily be adversely affected by contamination, even if the estimator itself was able to resist all outliers.

Several general approaches for handling the robustness problem are available. A general procedure to speed up the bootstrap method however is not. In this talk, some fast and robust bootstrap methods will be presented. In particular, methods for the popular Least Trimmed Squares and Minimum Covariance Determinant estimators will be considered, as well as for S-estimators and MM-estimators.

## A history of smooth tests of goodness of fit

J.C.W. Rayner

School of Mathematics and Applied StatisticsUniversity of Wollongong, Australia

Friday , 17 September 2004 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Goodness of fit testing is briefly reviewed, starting from the landmark paper of Pearson (1900). The smooth tests were introduced by Neyman (1937), but Pearson's so called X2 test is a smooth test. Emphasis is given to the recent work of Rayner and Best, and now Thas, which focuses on interpretability and more complex inference.

## Robust Variable Selection

Ruben Zamar

University of British Columbia

Friday , 14 May 2004 , 13h00 - 14h00

A3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Robust model selection has not received much attention in the robustness literature. A few papers that address this issue include (Ronchetti 1984) and (Ronchetti and Staudte 1994), where the authors robustify the normal-theory selection criteria AIC and Cp, respectively. Morgenthaler et al. (2003) propose a selection technique to identify the correct model structure as well as unusual observations. Ronchetti et al. (1997) propose robust model selection by cross-validation.

One major drawback of robust model selection tools is that they are in general computationally intensive and time consuming, as they require the fitting of all submodels. One exception is the model selection based on the Wald test (Sommer and Huggins, 1996) which requires the computation of estimates only from the full model. However, fitting the `full' model may not be reasonable or computationally feasible.

In this study, we focus our attention on the robustification of Stepwise Regression. This will provide us with a robust ordering of the covariates so that we can choose a number of predictors from the top of the list. Efron et al. (2003) propose Least Angle Regression (LARS), a promising normal-theory algorithm that has clear advantages over the Forward Selection and the Forward Stagewise procedures. We illustrate the sensitivity of LARS to outliers and present two different approaches to its robustification. The robust LARS is computationally suitable because we can avoid the fitting of all the submodels and the full model.

References

Efron, B.E., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least Angle regression. Annals of Statistics, to appear.

Morgenthaler, S., Welsch, R.E. and Zenide, A. (2003) Algorithms for robust model selection in linear regression. ICORS 2003 proceedings.

Ronchetti, E. (1985). Robust model selection in regression. Statistics and Probability Letters, 3, 21-23.

Ronchetti, E. and Staudte, R.G. (1994). A robust version of Mallow's Cp. Journal of the American Statistical Association, 89, 550-559.

Ronchetti, E., Field, C. and Blanchard, W. (1997) Robust linear model selection by cross-validation. Journal of the American Statistical Association, 92, 1017-1023.

Sommer, S. and Huggins, R.M. (1996). Variable selection using the Wald Test and Robust $C_{p}$. J.R. Statist. Soc., B, 45, 15-29.

## Screening for Potentially Informative Dropout in Longitudinal Studies with Binary Outcome

Tom Loeys

Merck Sharp & Dohme, Brussels, Belgium

Friday , 7 May 2004 , 14h00 - 15h00

A3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Longitudinal studies are often faced with drop-out. The natural question arising is then whether the time to drop-out is associated with the longitudinal trajectory. Following Henderson et al. (Biostatistics, 2000), we address that question through joint modeling approaches for the time to drop-out and the longitudinal process. The proposed methodology is applied to a clinical trial in the acute treatment of migraine. Especially we present a sensitivity analysis exploring the impact on the inference from the longitudinal data model when the time to drop-out from the study is related to the patient's headache relief trajectory through unobserved covariates.

## Missing Data Methods for Structural Equation Models

Peter M. Bentler

Departments of Psychology & Statistics University of California, Los Angeles

Wednesday , 14 April 2004 , 16h00 - 17h00

Auditorium A , H. Dunantlaan 1, 9000 Gent, Belgium

### Abstract:

Traditional, newer and newest methods for the analysis of structural equation models based on incomplete data are discussed. Among the traditional methods are listwise deletion, pairwise present analyses, mean imputation, regression imputation, hot deck imputation. Among newer methods are the expectation-maximization (EM) algorithm and case-wise or direct maximum likelihood. Among the newest are statistically justified pairwise present methods, pseudo or robust maximum likelihood, and tests of homogeneity of means and covariances. Simulation studies of the performance of the newer and newest methods are discussed. Interpreting the role and limitations of these methods involves concepts of MCAR (missing completely at random) and MAR (missing at random). Implementation via EQS 6 is mentioned.

## The valuation of Asian options for market models of exponential Levy type

Hansjörg Albrecher

Department of Mathematics B, Graz University of Technology, Austria

Friday , 2 April 2004 , 13h30 - 14h30

A3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

We consider asset price processes of exponential Levy type and derive various approximations and bounds for the Esscher and the mean-correcting price of European arithmetic and geometric average options. Furthermore, a static super-hedging strategy based on comonotonicity is developed. Numerical illustrations of the accuracy of these bounds and approximations are given for normal inverse Gaussian and variance-gamma distributed log returns, respectively. Finally, we compare the option prices in these models with the corresponding Black-Scholes prices.

## Beyond ignorance: evaluating the plausibility of possible parameter estimates and inferences when data are missing

James Carpenter

London School of Hygiene and Tropical Medicine

Friday , 6 February 2003 , 13h00 - 14h00

A3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

When some of the intended responses in a study are unobserved, an extra element of uncertainty is introduced into the analysis in addition to the familiar sampling imprecision.

In order to obtain parameter estimates, and make inferences, assumptions have to be made. It is thus important to examine the sensitivity of inference to these assumptions. If missing responses are categorical, a natural approach is to replace the usual point estimate with the range of estimates corresponding to all possible completions of the data. This leads naturally to optimistic/pessimistic bounds for a parameter, known as an interval of ignorance.

However, often the distribution of parameter estimates within this interval is far from uniform. We therefore develop and describe two approaches applicable to discrete data modelled using generalised linear models.

The first we term the Estimates Above a Threshold (EAT) algorithm. This permits calculation of the proportion of parameter estimates which lie above a threshold under the analyst's choice of probability distribution for the missing data.

The second approach enables the calculation of the expected p-value, again under the analyst's choice of probability distribution for the missing data.

We illustrate our ideas with data from a randomised controlled trial (RCT) of different doses of a pain killer following molar extraction and a recent RCT of interventions to improve the quality of peer review.

This is joint work with Claudio Verzilli, Imperial College London.

## Developments in Longitudinal Studies

Geert Verbeke

Biostatistical Centre, K.U.Leuven, Belgium

Friday , 5 December 2003 , 13h00 - 14h00

A2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Nowadays, linear mixed models (Verbeke and Molenberghs, 2000) are probably the most frequently used models for the analysis of continuous longitudinal data or repeated measurements. This class of models has deserved a lot of attention in the statistical literature, and is available in a wide variety of commercially available software packages (e.g., SAS, SpluS, ...).

Still, many of the statistical properties such as interpretation of the parameters, and sensitivity with respect to model misspecifications have deserved relatively little attention. In this presentation, some of these aspects will be looked at in more detail. It will be shown that, depending on the inferences of interest, inferences may or may not be highly affected

by wrong distributional assumptions or by omission of important covariates. In cases of sensitivity, we will show how more robust inferences can be obtained from extended versions of the classical model.

Finally, some aspects of model formulation and parameter interpretation will be discussed. Linear mixed models can be interpreted hierarchically or marginally, and this has important consequences with respect to estimation and inference. Here, the likelihood ratio test and the score test will be discussed and compared in the context of linear mixed models.

All results will be extensively illustrated using real data.

Reference:

Verbeke, G. and Molenberghs, G. (2000). Linear mixed models for longitudinal data. Springer Series in Statistics, Springer-Verlag, New-York.

## Methodology for genetic analyses of twins and families

Sylvie Goetgeluk

Ghent UniversityDepartment of Applied Mathematics and Computer Science

Friday , 31 October 2003 , 13h00 - 14h00

A2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available - Reading Club

## Functionals of clusters of extremes

Johan Segers

Tilburg UniversityDepartment of Econometrics and OR

Friday , 24 October 2003 , 13h00 - 14h00

A2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

For arbitrary stationary sequences of random variables satisfying a mild mixing condition, distributional approximations are established for functionals of clusters of exceedances over a high threshold. The approximations are in terms of the distribution of the process conditionally on the event that the first variable exceeds the threshold. This conditional distribution is shown to converge to a non-trivial limit if the finite-dimensional distributions of the process are in the domain of attraction of a multivariate extreme-value distribution. In this case, therefore, limit distributions are obtained for functionals of clusters of extremes, thereby generalizing results for higher-order stationary Markov chains by S. Yun (2000), J. Appl. Probab. 37, 29--44.

## The Shared Frailty Model

Paul Janssen

Center for StatisticsLimburgs Universitair Centrum

Friday , 12 September 2003 , 11h00 - 12h00

V3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

We first show the usefulness of frailty models to describe multivariate survival data and we describe the importance of the heterogeneity parameter (the parameter of the frailty density).

In a second part we review some aspects of statistical inference for frailty models.

Finally we discuss the asymptotic behaviour of the likelihood ratio test for heterogeneity in shared frailty models.

## Semiparametric regression for repeated outcomes with nonignorable intermittent nonresponse

S. Vansteelandt, A. Rotnitzky and J. Robins

Ghent University, Gent, Belgium and Harvard School of Public Health, Boston, USA

Wednesday , 18 June 2003 , 16h00 - 17h00

A3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

We examine a new class of models for making inference about the mean of a vector of correlated outcomes when the outcome vector is incompletely observed in some study units and missingness is non-monotone.

Each model in our class is indexed by a set of unidentified selection bias functions which quantify the residual association of the outcome at each occasion t and the probability that this outcome is missing at the t-th occasion after adjusting for variables observed prior to time t and the past non-response pattern. In particular, selection bias functions equal to zero encode the investigators a priori belief that non-response of the next outcome can be entirely explained by the observed past. We call this assumption sequential explainability.

Because each model in our class is non-parametric, it fits the data perfectly well. As such, our models are ideal for conducting sensitivity analyses aimed at evaluating the impact that different degrees of departure from sequential explainability have on inference about the marginal means of interest. Sensitivity analysis is conducted by examining how inference about the marginal means changes as one varies the selection bias functions regarded as known under each model.

We then extend our proposed class of models to incorporate: 1) data configurations which include baseline covariates and, 2) a parametric model for the conditional mean of the vector of correlated outcomes given the baseline covariates. We describe a class of estimators for the parameter indexing the conditional mean model, which up to asymptotic equivalence, comprise all consistent and asymptotically normal estimators of this parameter under the postulated model for non-response in the class. Finally, we describe a nearly efficient estimator of this parameter.

## Multiscale triangulations and second generation wavelets in nonlinear smoothing of scattered data

Maarten Jansen

Technische Universiteit EindhovenDepartment of Mathematics and Computer Science

Friday , 13 June 2003 , 13h00 - 14h00

A3 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Wavelet techniques have gained a considerable popularity in non-parametric data smoothing. The very construction of classical wavelets makes them intrinsically unsuited for non-equispaced data. Some efforts have been made to deal with this problem in 1-d (one dimension). First, this talk presents the application of the so called lifting scheme (by Sweldens), which allows for a natural extension of wavelet theory towards irregularly sampled data, both in 1-d and 2-d. Secondly, for the 2-d case, we present a multiscale (Delaunay) tesselation of the data, and a corresponding wavelet transform, based on Sibson interpolation. A third element of our approach is the actual estimation on wavelet coefficients. We discuss simple thresholding as well as a Bayesian procedure, based on the Johnstone-Silverman model. Finally, we discuss some issues of numerical condition related to this wavelet transform.

## Mapping Soil Texture at a Regional Scale using Pedometrical Techniques

Marc Van Meirvennne

Department Soil Management and Soil CareFaculty of Agricultural and Applied Biological Sciences, Ghent University

Friday , 06 June 2003 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Pedometrical techniques are numerical methods used to describe and analyze soil properties in a quantitative way. Frequently these are related to spatial inventory, space-time modeling and statistical survey. We used such techniques to map soil texture at a regional scale.

The study area covered about 3000 km2 and is located in Belgium. It was selected because it contains a large range of soil types with different geological histories. In total 4887 topsoil samples were analyzed for soil texture and a 1/100.000 choropleth soil texture map (based on the Belgian soil texture classification) was also available. However, an update of this map was required, as well as a reclassification according to the internationally accepted USDA texture triangle. Moreover, a quantitative map of the three major soil textural classes (clay, silt and sand) was needed as input for GIS linked models.

To map the three textural fractions quantitatively we used compositional kriging, which is a version of ordinary kriging to which some conditions were added. One of these conditions is that the sum of the three fractions must equal 100, which is not ensured when each fraction is interpolated independently. The data set was stratified according to the delineations of the choropleth soil map. These delineations represent either crisp physical boundaries or transition zones. Therefore, different stratification and interpolation strategies were followed according to the nature of the map boundaries. The resulting maps were classified according the both the Belgian and the USDA textural triangles allowing for the first time a comparison between both classification systems.

Finally a sensitivity analysis was conducted to explore the uncertainty related tot the textural classification. Therefore a Monte Carlo analysis was used based on the kriging variance of the predictions of each textural fraction. This information can be used in a GIS whenever the mapping quality of the classified maps is required.

## Model averaging, post-model selection inference and the focussed information criterion

Gerda Claeskens

Universitair Centrum Limburg

Tuesday , 27 May 2003 , 15h30 - 16h30

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

The traditional use of model selection methods in practice is to proceed as if the final selected model had been chosen a priori, without acknowledging the additional uncertainty introduced by model selection. This often means underreporting of variability and too optimistic confidence intervals, for example. In addition to quantifying the implied cost of model selection involved in AIC and similar procedures, I give results for estimators that smooth across many models. This amounts to a frequentist parallel to the Bayesian model averaging methods.

In a second part of the talk I take the view that the model selector should focus on the parameter singled out for interest; in particular, a model which gives good precision for one estimand may be worse when used for inference for another estimand. This yields a focussed information criterion, the FIC.

This is joint work with Nils Lid Hjort.

## Inference on Survival Data with Covariate Measurement Error-An Imputation-based Approach

Yi Li

Harvard School of Public Health

Tuesday , 20 May 2003 , 16h00 - 17h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

We propose a new method for fitting proportional hazards models with error-prone covariates. Regression coefficients are estimated by solving an estimating equation that is the average of the partial likelihood scores based on imputed true covariates. For the purpose of imputation, a local spline model is assumed on the baseline hazard.

We discuss consistency and asymptotic normality of the resulting estimators, and propose a stochastic approximation scheme to obtain the estimates. The algorithm is easy to implement, and reduces to the ordinary Cox partial likelihood approach when the measurement error has a degenerate distribution. Simulations indicate high efficiency and robustness. We consider the special case where error-prone replicates are available on the unobserved true covariates. As expected, increasing the number of replicates for the unobserved covariates increases efficiency and reduces bias.

We illustrate the practical utility of the proposed method with an Eastern Cooperative Oncology Group clinical trial where a genetic marker, c-myc expression level, is subject to measurement error.

## Reporting and Statistics using SAS Enterprise Guide

Saar De Zutter

Department of Applied Mathematics and Computer Science Ghent University

Friday , 9 May 2003 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available - READING CLUB

## The implementation of cancer registration in Belgium: a never ending story ?

Joost Weyler

Department EpidemiologieFaculteit Geneeskunde, Antwerpen University

Friday , 25 April 2003 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Belgium was one of the first countries in Europe to start with cancer registration on a national basis. However, there are important indications of poor quality of the registered data. Based on the experience in the Netherlands different provincial cancer institutes emerged. One of the aims of these institutes was to set up regional cancer registries. To date two cancer registries have emerged from these initiatives. One, the Limburg Cancer registry (LIKAR) collects data from all pathologists in the province of Limburg. The other one, the Antwerp Cancer Registry (AKR), is based on active registration by data nurses. Despite the high quality of the registered data, the future for these registries is uncertain.

## The analysis of QoL data

Kristel Van Steen

Biostatistics, Center for StatisticsLimburgs Universitair Centrum

Friday , 11 April 2003 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

When evaluating the efficacy of medical treatment on cancer, prolongation of life expectancy and tumor shrinkage have traditionally been taken as outcome measures. Despite the substantial side effects and functional impairment often associated with cancer treatment, it is only recently that attention has been given to the assessment of quality of life (QoL). This increasing interest in QoL has important implications for clinical trials, as careful planning is required at all stages of a study from protocol design through reporting of results.

Following a classical protocol layout, we will discuss protocol contents relevant to QoL assessment. These topics include (1) the choice of QoL scale scoring system, (2) timing of assessments, (3) the mode of data collection, and (4) statistical considerations with an emphasis on data analysis methods.

Several different scales are common in QoL research, as there are Likert scales, Visual Analogue Scales (VAS), adjectival scales. In this talk special attention is given to the EORTC QLQ-C30 questionnaire with multi-item scales and single item measures.

Obviously, analyzing quality of life data from these questionnaires may be complicated for several reasons. Quality of life data not only involves repeated measures, but it is also usually collected on ordered categorical responses. In addition, it is evident that not all patients provide the same number of assessments, due to attrition caused by death or other medical reasons. Some patients may fail to answer only a few questions or items on the questionnaire.

mptions will drive the analysist in the choice of an appropriate model and estimation technique.

ng the direction or magnitude of effects of the predictor variable on the response variable, even with a correct model specification. It is self-evident that these issues need to be addressed before drawing conclusions from a prognostic factor analysis. Model instability in prognostic factor analyses will be illustrated via bootstrap procedures and model-averaging methodology on data generated from EORTC phase III clinical trials.

## The S-Language

Saar De Zutter

Department of Applied Mathematics and Computer ScienceGhent University

Friday , 14 March 2003 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available - READING CLUB

## Score tests for detecting linkage to quantitative traits

Hein Putter

Leids University Medical Centre

Friday , 28 February 2003 , 16h00 - 17h00

A0 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

This talk is concerned with statistical methods to localize genes responsible for quantitative traits, i.e. for "diseases" that can be measured on a quantitative scale, such as high blood pressure, cholesterol. The location of such a gene is called a quantitative trait locus (QTL). The first step in the search for QTLs is a "linkage analysis"omponents likelihood ratio test is also asymptotically equivalent to this optimal Haseman-Elston test. This fact gives a theoretical explanation of the empirical observation from simulation studies reporting similar power of the variance components likelihood ratio test and the optimal Haseman-Elston method. If time permits, I will discuss extensions to support the simultaneous analysis of more than two loci and multivariate phenotypes.

## Modelling family data: from segregation to linkage analysis

Jeanine Houwing-Duistermaat

University Rotterdam

Friday , 24 January 2003 , 13h00 - 14h00

A2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available

## Introduction to Robust Statistics

Stefan Van Aelst

Ghent University

Friday , 17 January 2003 , 13h00 - 14h00

A2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available - READING CLUB

## The challenge of patient choice and non-adherence to treatment in RCTS of counselling and psychotherapy

Graham Dunn

University of Manchester

Friday , 10 January 2003 , 13h00 - 14h00

A2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Patient preferences, beliefs and motivation influence compliance to the psychological interventions in RCTs in psychiatry, and compliance in its turn is an important predictor of loss to follow-up. It has also been suggested that preferences themselves may also modify the effects of the treatment received (Brewin & Bradley, BMJ 299, 313-5, 1989). Using a counterfactual causal model, I will illustrate the estimation the Complier-Average Causal Effect (CACE) in a mult-centre psychotherapy trial for depression (ODIN: Dowrick et al., BMJ 321, 1450-4, 2000) which was subject to both lack of compliance and loss to follow-up. I will then ask how such methods might help us to valuate the potential of partially-randomized 'patient preference' designs (Brewin & Bradley, 1989) and to compare them with other, ossibly more promising, alternatives.

## Structural accelerated failure time models and recurrent events

An Vandebosch

Ghent University

Friday , 13 December 2002 , 13h00 - 14h00

A2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available

## Het schatten van parameters in rentevoetmodellen

Ella Roelant

Ghent University

Friday , 6 December 2002 , 13h00 - 14h00

A2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available - PRESENTATION IN DUTCH

## Testability of the Coarsening At Random (CAR) assumption

Eric Cator

Delft University of Technology

Monday , 18 November 2002 , 16h00 - 17h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available

## Causal graphs in epidemiology

Stijn Vansteelandt

Ghent University

Wednesday , 6 November 2002 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available - READING CLUB

## A frailty model for HIV infection in mobile and non-mobile cohorts from a rural district of South Africa

Khangelani Zuma

University of Waikato, New Zealand

Friday , 20 September 2002 , 14h00 - 15h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

Not available

## Improving response prediction in direct marketing by optimizing for specific mailing depths

Van den Poel, A. Prinzie & P. Van Kenhove

Department of MarketingGhent University, Belgium

Friday , 20 September 2002 , 13h00 - 14h00

V2 , Krijgslaan 281, Building S9, 9000 Gent, Belgium

### Abstract:

We adapt binary logistic regression by iteratively changing the true values of the dependent variable during the maximum-likelihood estimation procedure. Those customers who rank lower than the cutoff in terms of predicted purchase probability, imposed by the mailing-depth restriction, will not contribute to the total likelihood. We illustrate our procedure on a real-life direct-marketing dataset comparing traditional response models to our innovative approach optimising for a specific mailing depth. The results show that for mailing depths up to 48% our method achieves significant and substantial profit increases.