Probability Tidbits 5 - Random Variables

Random variables aren’t actually variables. They’re measurable functions (see previous post here) between a measurable space $(\Omega, \mathcal{F})$ and $(\mathbb{R}, \mathcal{B})$ (I’m going to use $\mathcal{B}$ instead of $\mathcal{B}(\mathbb{R})$ as I’ve done previously as shorthand). Recall that for a measurable function $h: \Omega \to \mathbb{R}$, $\forall B \in \mathcal{B}, h^{-1}(B) \in \mathcal{F}$.

Why do we care?

Why do we work with random variables instead of the probability space $(\Omega, \mathcal{F}, P)$ in the first place? This is more of a philosophical question about statistics than a concrete explanation, but I’ll try my best:

We often deal with discrete decisions based of uncountably many outcomes. When we want to go outside, our brain asks “is it going to be hot or cold?”. When we perform experiments to either reject or fail to reject the null hypothesis, we are tasked also with a binary outcome. Temperature range can be expressed as $[0, \infty) \subset \mathbb{R}$, and there are uncountably many outcomes. We don’t really ask the question “will the temperature be exactly 3 degrees Kelvin” to decide whether we should wear a heavy coat, but instead we express it in terms of \(P(\{\omega: \omega \leq T\})\) as “what’s the probability it’s going to be cold, i.e. below some threshold temperature $T$”. An intuitive way to represent this is via a random variable:

\[X = \begin{cases} 0 \text{ if } \omega \leq T \\ 1 \text{ else} \end{cases}\]

Which maps our observations $[0, \infty)$ to ${0, 1}$. Here, the $\sigma$-algebra generated by $X$ is:

\[\sigma(\{\{\omega: \omega \leq T\}, \{\omega: \omega > T\}\}) = \{\{\omega: \omega \leq T\}, \{\omega: \omega > T\}, \emptyset, \Omega\}\]

This is obviously a subset of $\mathcal{F}$, but is perhaps really the only measurable sets we care about to answer our question.

Random variables generate $\sigma$-algebras

In general, where $C$ is an index set (like $\mathbb{N}$ for example), a sequence of random variables $X_n: \Omega \to \mathbb{R}, n \in C$ from a measurable space $(\Omega, \mathcal{F})$ to $(\mathbb{R}, \mathcal{B})$, we can define the smallest $\sigma$-algebra $\mathcal{X}$ such that each $X_n$ is $\mathcal{X}$-measurable:

\[\sigma(X_n: x \in C) := \sigma(\{\{\omega \in \Omega: X_n(\omega) \in B\} : n \in C, B \in \mathcal{B}\})\]

This, in other words, is the $\sigma$-algebra generated by how much each random variable partitions the original $\Omega$ into values in $\mathbb{R}$. As we observe more random variables in the sequence $X_n$, we get a more granular generated $\sigma$-algebra which contains more measurable sets, which corresponds to us being able to ask more complicated questions about $\Omega$.

Laws & distributions

For a random variable $X: \Omega \to \mathbb{R}$, and a probability $P: \mathcal{F} \to [0, 1]$, a probability law maps the random variable image (e.g. $\mathcal{B}$) to a probability measure. To be precise, $\mathcal{L}_X: \mathcal{B} \to [0, 1], \mathcal{L}_X := P \circ X^{-1}$ where $X^{-1}$ is the random variable preimage. As we know, ${(-\infty , c]: c\in \mathbb{R} }$ is a $\pi$-system that generates $\mathcal{B}$ (using a d-system). We can denote the probability law on the elements of the $\pi$-system as our familiar friend, the cumulative distribution function of $X$:

\[F_X(c) := \mathcal{L}_X((-\infty, c]) := P(X \leq c)\]

Cumulative distribution functions defined this way are always right-continuous, which means

\[F_X(c) = \text{lim}_{x \to c+} F_X(x)\]

Denote \(A = \{\omega : X(\omega) \leq x\}\), note that $F_X(x) = P(A)$. For a series of \(A_n = \{\omega: X(\omega) \leq x + a_n\}\) where \(\{a_n\}_n\) is monotonically decreasing(if a series does converge to $x$ from the right we can construct such a monotonic subsequence), we have that $A \subset … A_n \subset A_{n-1} … \subset A_1$ and $\lim_{n \to \infty} A_n = A$. By monotone convergence, we have the result. Note that the cumulative distribution functions are not left-continuous though - since if we were to construct a similar series of $A_1 \subset … A_{n-1} \subset A_n … \subset A$, we would eventually create the set \(\text{lim}_{n \to \infty} A_n = \{\omega : X(\omega) < x\}\), missing notably \(\{\omega : X(\omega) = x\}\), which would not be true for e.g. point masses at $x$.