September, 15 2017
David Steinberg
Most of us come across new ideas and interesting research by attending conferences and reading journals. Naturally, we begin with those meetings and journals that are in our own field. However, many articles with interesting statistical content appear in other journals and meetings. This should not be surprising: statistics is a part of research in almost every discipline and sometimes the statistical challenges themselves are first presented in articles aimed at those in a particular field of study, whether economics, chemistry or marketing. A good example is conjoint analysis, which was developed largely by researchers in marketing.
The goal of this column is to bring such interesting articles to the attention of ISBIS members. Three examples from different areas are presented here. The first is an article on the efficiency of allocation schemes for on-line experiments (so-called A/B testing). The second describes the use of random effects models to characterize performance curves for a medical treatment. The third derives and studies a model for generating a housing price index that can be adapted to local time and geographical scales.
I think that articles like these will be of interest both at the applied and the research level. For practitioners, they provide novel applications in journals that would otherwise have been completely missed. For researchers, they may be first clues to “hot” new problems where they can make constructive advances in study design or data analysis.
We intend to make this column a regular feature of the ISBIS Newsletter. Let us know if you agree with us that it is a worthwhile addition to the Newsletter.
To make this idea work, of course, requires getting good article descriptions. So I want to appeal to you, our members, to consider contributing a description for the newsletter.
Here are some guidelines for contributions to the column. Articles should have interesting statistical content but be published in a non-statistical venue (including arxiv). The article might present an interesting application or a problem or data source that requires new statistical methods. The article might itself develop methods. Article summaries should be short (say up to 2/3 page). They should describe the problem and why it should interest other ISBIS members. Sometimes it will suffice just to copy the abstract. Often, though, abstracts are directed toward those with knowledge of the field, who don’t need to be convinced why it is important. See yourself as a sales person – you found this article interesting, so tell us why. What makes it worth reading? You will want to include bibliographic information on the article so that other members can easily find it. Send contributions to me at dms@post.tau.ac.il
If you regularly read field journals, please keep this column in mind so that you can share with your colleagues some of the interesting material that you read.
Analysis of Thompson Sampling for the Multi-armed Bandit Problem. Shipra Agrawal and Navin Goyal. Journal of Machine Learning Research: Workshop and Conference Proceedings vol 23 (2012) 39.1–39.26
IT companies have become major users of designed experiments. They run “A/B experiments” to compare alternative definitions of internet pages, with the goal of increasing the number of users who click on relevant links and, ultimately, purchase services. These experiments are often run on high volume sites, making it important to learn quickly which options are best and then to implement them. This paper studies one of the popular sampling strategies, known as Thompson Sampling. This is a randomized algorithm in which the experimental landing page is sampled from among all options with the probability that it is the best of all the options. As experimental evidence accumulates, more and more users are presented with the page(s) thought to be best. This interesting article by Agrawal and Goyal presents a theoretical analysis of what is achieved by Thompson sampling. Their analysis focuses on what is known as the regret – the difference between the best landing page that could possibly have been offered and what the algorithm achieves. They show that the regret for Thompson sampling is bounded by a term that is logarithmic in the experimental horizon.
Personalized prediction of chronic wound healing: An exponential mixed effects model using stereophotogrammetric measurement. Yifan Xu, Jiayang Sun, Rebecca R. Carter and Kath M. Bogie. Journal of Tissue Viability, 2014, 23, pp. 48-59
In many studies, performance is studied by tracking a curve over time. Interesting statistical issues arise for modeling such performance curves. This article examines the healing process for pressure ulcers, exploiting detailed data over a period of approximately 40 days from the initiation of treatment. Sophisticated imaging equipment was used to make weekly measurements of the size of the ulcers. The data set included a total of 147 images from 13 wounds on 10 patients. The modeling challenges include how to characterize the trend over time, how the performance curves differ across wounds and subjects, and whether random effect terms are necessary to describe this variation. The authors present several different mixed effect models, both linear and nonlinear, that might be considered for representing these data. The best fitting model was a mixed-effects exponential decrease model, in which observed wound size is predicted by time and by initial wound size. Random effects capture individual differences with respect to the healing pattern. The model is used to investigate some metrics for healing and to compute personalized prediction intervals based on the initial data from a patient.
Achieving a Hyperlocal Housing Price Index: Overcoming Data Sparsity by Bayesian Dynamical Modeling of Multiple Data Streams. You Ren, Emily B. Fox, and Andrew Bruce.
Understanding how housing values evolve over time is important to policy makers, consumers and real estate professionals. Existing methods for constructing housing indices are computed at a coarse spatial granularity, such as metropolitan regions. However, real estate prices often have important local dynamics; regional estimates are unable to reflect and model these local features. A major challenge in deriving estimates at a local level is the sparsity of house sales data in the relevant spatio-temporal window. Ren, Fox and Bruce develop a novel analysis to address this challenge, producing monthly estimates at the census tract level. The central idea in their approach is a latent factor model for clustering census tracts into groups that have similar property value dynamics. The latent factor is characterized by similarities in market behavior and not by geographical proximity. The authors note that tracts that have similar market activity are often geographically distant from one another; as an example they note the similarity of tracts with waterfront properties. A nonparametric Bayesian clustering approach is used for the latent process. The clustering is entirely data driven. The authors explore methods that can be used to scale the methods to finer levels of spatial and temporal resolution. They also suggest how to take advantage of parallel computation. They illustrate the method by analyzing a large data set from the metropolitan Seattle area, which includes more than 120,000 transactions on 140 census tracts over more than 15 years.