{ "cells": [ { "cell_type": "markdown", "id": "9941e1ce-2ca3-4581-a187-9676c6442d1a", "metadata": {}, "source": [ "(lecture_19)=\n", "# Generalized Linear Madness\n", ":::{post} Jan 7, 2024\n", ":tags: statistical rethinking, bayesian inference, generalized linear models\n", ":category: intermediate\n", ":author: Dustin Stansbury\n", ":::\n", "\n", "This notebook is part of the PyMC port of the [Statistical Rethinking 2023](https://github.com/rmcelreath/stat_rethinking_2023) lecture series by Richard McElreath.\n", "\n", "[Video - Lecture 19 - Generalized Linear Madness](https://youtu.be/zffwg0xDOgE)# [Lecture 19 - Generalized Linear Madness](https://www.youtube.com/watch?v=zffwg0xDOgE)" ] }, { "cell_type": "code", "execution_count": 1, "id": "2fb45089-f9d4-4129-b7ff-3bcb13ed8ba7", "metadata": {}, "outputs": [], "source": [ "# Ignore warnings\n", "import warnings\n", "\n", "import arviz as az\n", "import numpy as np\n", "import pandas as pd\n", "import pymc as pm\n", "import statsmodels.formula.api as smf\n", "import utils as utils\n", "import xarray as xr\n", "\n", "from matplotlib import pyplot as plt\n", "from matplotlib import style\n", "from scipy import stats as stats\n", "\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "# Set matplotlib style\n", "STYLE = \"statistical-rethinking-2023.mplstyle\"\n", "style.use(STYLE)" ] }, { "cell_type": "markdown", "id": "f9465b34-195e-4dc5-9eb3-329eefd6ee70", "metadata": {}, "source": [ "# GLMs & GLMM & Generalized Linear Habits\n", "\n", "- Flexible association engines, not mechanistic scientific models of mechanisms that generate phenomena\n", " - May leave information on the table\n", " - May be less efficient\n", " - Mechanistic causal models often more interpretable\n", "- Rather they are information-theoretic devices\n", "- Even still, >90% of scientific work is done using GLM/GLMMs\n", "\n", "**Alternative Approach - Start with simple sceintific principles, and build in estimation**\n", "\n", "## Revisiting Modeling Human Height ⚖️\n", "\n", "- Weight of human is proportional to the **volume** of the human\n", " - Taller humans have more volume that shorter humans\n", " - Thus taller humans weight more (I know this is super-basic, that's the point)\n", "- Thus the weight of a human falls back on simple physics: **the shape of a human determines their mass, and thus their weight**\n", "- We can model the **shape** of a human in lots, of ways, but the simplest (albeit cartoonish) is probabily as a cylinder.\n", " - rare that people are wider than they are tall -- we can model radius of cylinder as proportion of height\n", " - push statistical estimation as far down the analysis pipeline as possible\n", "\n", "### Cartoonish Scientific model\n", "\n", "- Captures the causal relationships between height and weight\n", "- Causal because, it tells us how changing height will effect weight\n", "\n", "$$\n", "\\begin{align*}\n", "V &= \\pi r^2 h &\\text{Volume of cylinder} \\\\\n", "V &= \\pi (ph)^2 h &\\text{radius as proportion of height, } p \\\\\n", "W &= kV = k\\pi (ph)^2 h &\\text{converting volume to weight} \\\\\n", "W &= k\\pi p^2 h^3 \\\\\n", "\\end{align*}\n", "$$\n", "\n", "- $W$ weight (observed data, a' la' Howell dataset)\n", "- $h$ heith (observed data)\n", "- $k$ density (parameter to estimate)\n", "- $p$ proportion of height (parameter to estimate)\n", "\n", "### Statistical Model\n", "\n", "#### General conversion of the scientific model to statistical model\n", "\n", "$$\n", "\\begin{align*}\n", "W_i &\\sim \\text{Likelihood}(\\mu_i) &\\text{\"error\" distribution for } W\\\\\n", "\\mu_i &= k\\pi (ph)^2 H_i &\\text{expected value of } W_i | H_i \\\\\n", "p &\\sim \\text{Prior}(...) &\\text{distribution for proportionality} \\\\\n", "k &\\sim \\text{Prior}(...) &\\text{distribution for density} \\\\\n", "\\end{align*}\n", "$$\n", "\n", "\n", "#### Priors\n", "\n", "Now that parameters $p,k$ have real-world interpretations, determining priors for them is now more straight-forward than for a typical GLM/GLMM\n", "\n", "**Prior Development Workflow**\n", "1. Choose measurement scale (or removing them)\n", "2. Simulate\n", "3. Think" ] }, { "cell_type": "markdown", "id": "435d3ec7-3ea1-48dc-83e0-5047e0fc8081", "metadata": {}, "source": [ "**1. Measuremnt Scales**\n", "- $\\mu$ - $kg$\n", "- $H$ - $cm^3$\n", "- $k$ - $kg/cm^3$\n", "\n", "\n", "- You can always rescale a measurement scale using a reference\n", " - e.g. dividing out mean weight and height\n", "- Try dividing out measuremnt units when possible\n", "- Often easier to deal with dimensionless quantities\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "06cfe6d8-0548-44b7-aa8e-4088ca166ce4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | height | \n", "weight | \n", "age | \n", "male | \n", "height_scaled | \n", "weight_scaled | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "151.765 | \n", "47.825606 | \n", "63.0 | \n", "1 | \n", "1.097650 | \n", "1.343015 | \n", "
| 1 | \n", "139.700 | \n", "36.485807 | \n", "63.0 | \n", "0 | \n", "1.010389 | \n", "1.024577 | \n", "
| 2 | \n", "136.525 | \n", "31.864838 | \n", "65.0 | \n", "0 | \n", "0.987425 | \n", "0.894813 | \n", "
| 3 | \n", "156.845 | \n", "53.041914 | \n", "41.0 | \n", "1 | \n", "1.134391 | \n", "1.489497 | \n", "
| 4 | \n", "145.415 | \n", "41.276872 | \n", "51.0 | \n", "0 | \n", "1.051723 | \n", "1.159117 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 539 | \n", "145.415 | \n", "31.127751 | \n", "17.0 | \n", "1 | \n", "1.051723 | \n", "0.874114 | \n", "
| 540 | \n", "162.560 | \n", "52.163080 | \n", "31.0 | \n", "1 | \n", "1.175725 | \n", "1.464818 | \n", "
| 541 | \n", "156.210 | \n", "54.062497 | \n", "21.0 | \n", "0 | \n", "1.129798 | \n", "1.518157 | \n", "
| 542 | \n", "71.120 | \n", "8.051258 | \n", "0.0 | \n", "1 | \n", "0.514380 | \n", "0.226092 | \n", "
| 543 | \n", "158.750 | \n", "52.531624 | \n", "68.0 | \n", "1 | \n", "1.148169 | \n", "1.475167 | \n", "
544 rows × 6 columns
\n", "| \n", " | Year | \n", "Lynx | \n", "Hare | \n", "
|---|---|---|---|
| 0 | \n", "1900 | \n", "4.0 | \n", "30.0 | \n", "
| 1 | \n", "1901 | \n", "6.1 | \n", "47.2 | \n", "
| 2 | \n", "1902 | \n", "9.8 | \n", "70.2 | \n", "
| 3 | \n", "1903 | \n", "35.2 | \n", "77.4 | \n", "
| 4 | \n", "1904 | \n", "59.4 | \n", "36.3 | \n", "
| 5 | \n", "1905 | \n", "41.7 | \n", "20.6 | \n", "
| 6 | \n", "1906 | \n", "19.0 | \n", "18.1 | \n", "
| 7 | \n", "1907 | \n", "13.0 | \n", "21.4 | \n", "
| 8 | \n", "1908 | \n", "8.3 | \n", "22.0 | \n", "
| 9 | \n", "1909 | \n", "9.1 | \n", "25.4 | \n", "
| 10 | \n", "1910 | \n", "7.4 | \n", "27.1 | \n", "
| 11 | \n", "1911 | \n", "8.0 | \n", "40.3 | \n", "
| 12 | \n", "1912 | \n", "12.3 | \n", "57.0 | \n", "
| 13 | \n", "1913 | \n", "19.5 | \n", "76.6 | \n", "
| 14 | \n", "1914 | \n", "45.7 | \n", "52.3 | \n", "
| 15 | \n", "1915 | \n", "51.1 | \n", "19.5 | \n", "
| 16 | \n", "1916 | \n", "29.7 | \n", "11.2 | \n", "
| 17 | \n", "1917 | \n", "15.8 | \n", "7.6 | \n", "
| 18 | \n", "1918 | \n", "9.7 | \n", "14.6 | \n", "
| 19 | \n", "1919 | \n", "10.1 | \n", "16.2 | \n", "
| 20 | \n", "1920 | \n", "8.6 | \n", "24.7 | \n", "