An introduction to unbiased [and doubly robust] estimators

Often, data collection cannot be completely random – e.g. in clinical trials, where it would be unethical to randomly treat people with medicine, or in online surveys, where response cannot be guaranteed. In such cases, data can be biased, so any inferences drawn or machine learning models built from this data will not generalize well to the overall population. This is where unbiased estimation can come in, in which small adjustments are effectively made to the dataset to make it more representative of a random sample.

Feature transformations for tree-based methods

Tree-based methods are fantastic at finding nonlinear boundaries, particularly when used in ensemble or within boosting schemes. However, there’s a lot to be learned about the humble lone decision tree that is generally overlooked (read: I overlooked these things when I first began my machine learning journey). In what follows I will briefly discuss how transformations of your data can vastly improve the ability of single trees to capture nonlinearities. If you are not yet familiar with decision trees, Elements of Statistical Learning has a wonderful overview.

Central limit theorem

The central limit theorem states that, given a parent distribution of mean $\mu$ and variance $\sigma^2$, if you independently draw $k$ samples from this parent distribution, average them to get one value, then repeat this process over and over, the resulting distribution formed by these values will approach a normal distribution of mean $\mu$ and variance $\sigma^2/k$ as $k \to \infty$.

The assumption of linearity behind logistic regression

Logistic regression is an adaptation of linear regression used to predict categorical outcomes. But the linearity is not directly apparent, and generally left undiscussed. In using logistic regression, we are actually assuming that the logarithm of the ratio between any two probabilities scales linearly with the features. I’ll briefly lay out why this is the case.

A brief comment on the loss function for logistic regression

Recently, the similarity of logistic to linear regression caused me to erroneously suggest that the analytical solution to logistic regression could be solved through matrix inversion (i.e. the $\beta = (X^T X)^{-1} X^T y$ solution of linear regression). But this is not the case!