A Simple Problem On Inferencing From Censored Data
2019-03-18
The Problem
This problem can have multiple formulations, a train that arrives precisely per
minutes and you measure all of your wait times; a ruler that is of length
and you keep track of all objects that can be measured; a bug that jumps
out of a glass of height out into the hot water so it only remembers the
unsuccessful jumps, etc. The goal is the same: how would you estimate that unknown
(, , or ) based on your data, say you have measurements?
Solution
This question requires some rigorousness to do it right, but intuition matches with that rigorousness very well.
Obviously the maximum of your observations, or the order statistics, is an estimator of the unknown, but you know it is always going to under estimate the truth. We need to
correct this slightly larger so that we can have an unbiased estimator of the truth.
Say your observations are from an uniform distribution (you leave anytime for the train, the length of the objects that your are measuring are uniformly distributed, or the bug jumps uniformly distributed heights). Then intuitively,
Then,
So here we have an unbiased estimator of .
The problem is the variance of this estimator is probably going to be large and at this point you gut tells you variance will decrease as sample size increase. So let's try a more rigorous way.
Say your observations are i.i.d. from a distribution with CDF . Then the distribution of the statistics is
Let's say we have an uniform distribution as assumed previously on , , . Then the distribution of the statistics is
Say , then . From results of the Beta distribution, and linearity of expectation we have
Establishing the previous intuitions.
But here are some further twists: what is the percentage of time that this estimator will be under estimating the truth? And how does this percentage change with ? This is basically,
That's about of time.
To simulate,
R> N = 1e5
R> pbeta(1-(1/(N+1)), N, 1)
# [1] 0.3678813
R> set.seed(123)
R> sum(replicate(5000, max(runif(N))*(N+1)/N < 1))
# [1] 1826
R> 1826/5000
# [1] 0.3652
© The Responsible Adult 2017 - 2025