耿贝尔分布

Gumbel
	概率密度函數
	累積分布函數
记号
参数	location (real); scale (real)
值域
概率密度函数	; 其中
累積分布函數
期望值	; 其中是Euler–Mascheroni常数
中位數
眾數
方差
偏度
峰度
熵
矩生成函数
特徵函数

在概率论和统计学中，耿贝尔分布（Gumbel分布，也称为I 型广义极值分布）用于对各种分布的多个样本的最大值（或最小值）的分布进行建模。

如果有过去十年的水位最大值列表，则此分布可用于表示特定年份河流最高水位的分布。它有助于预测发生极端地震、洪水或其他自然灾害的可能性。耿贝尔分布表示最大值分布的潜在适用性与极值理论有关，这表明如果基础样本数据的分布是正态或指数类型，它可能是有用的。本文使用耿贝尔分布对最大值的分布进行建模。要对最小值建模，请使用原始值的负值。

耿贝尔分布是广义极值分布（也称为 Fisher-Tippett 分布）的一个特例。它也称为对数Weibull 分布和双指数分布（该术语有时也用于指代拉普拉斯分布）。它与Gompertz分布有关：在原点附近，并限制在正半线上时，就得到了 Gompertz 函数。

在多项式logistic回归模型的潜变量公式中——在离散选择法理论中很常见——潜在变量的误差服从 Gumbel 分布。这很有用，因为两个耿贝尔分布的随机变量的差服从logistic分布。

耿贝尔分布以Emil Julius Gumbel (1891 – 1966) 的名字命名，来自描述该分布的原始论文。 ^[1] ^[2]

定义

耿贝尔分布的累积分布函数为

F(x;\mu ,\beta )=e^{-e^{-(x-\mu )/\beta }}.\,

标准耿贝尔分布

标准的耿贝尔分布是 $\mu =0$ 和 $\beta =1$ 时的特例，其累积分布函数为

F(x)=e^{-e^{(-x)}}\,

概率密度函数为

f(x)=e^{-(x+e^{-x})}.

此时，众数为 0，中位数为 $-\ln(\ln(2))\approx 0.3665$ ，均值为 $\gamma \approx 0.5772$ （歐拉-馬斯刻若尼常數），标准差为 $\pi /{\sqrt {6}}\approx 1.2825$ 。

对于 n>1，累积量由下式给出

\kappa _{n}=(n-1)!\zeta (n)

特性

众数为 μ，中位数为 $\mu -\beta \ln \left(\ln 2\right)$ ，平均值是

\operatorname {E} (X)=\mu +\gamma \beta

,

其中 $\gamma$ 是歐拉-馬斯刻若尼常數。

标准差 $\sigma$ 是 $\beta \pi /{\sqrt {6}}$ ，因此 $\beta =\sigma {\sqrt {6}}/\pi \approx 0.78\sigma .$ ^[3]

在众数处， $x=\mu$ ， $F(x;\mu ,\beta )$ 的值变为 $e^{-1}\approx 0.37$ ，与 $\beta$ 的取值无关。

应用

Gumbel 表明，随着样本量的增加，将服从指数分布的随机变量减去样本量^[7]的自然对数，其最大值的分布（或最后一阶统计量）接近耿贝尔分布。 ^[8]

具体来说，如果令 $\rho (x)=e^{-x}$ 是 $x$ 的概率分布， $Q(x)=1-e^{-x}$ 是其累积分布，那么对 $x$ 的 $N$ 次实现（realizations）的最大值小于 $X$ 当且仅当所有 $x$ 的实现都小于 $X$ 。所以最大值的累积分布 ${\tilde {x}}$ 满足：

P({\tilde {x}}-\log(N)\leq X)=P({\tilde {x}}\leq X+\log(N))=[Q(X+\log(N))]^{N}=\left(1-{\frac {e^{-X}}{N}}\right)^{N}

并且，对于较大的 $N$ ，等式右边收敛到 $e^{-e^{(-X)}}$ 。

因此，在水文学中，耿贝尔分布用于分析日降雨量和河流流量的月度和年度最大值等变量， ^[3]也用于描述干旱。 ^[9]

Gumbel 还表明，表示事件的概率的估计量^r⁄_(n+1)——其中r是观察值在数据序列中的排名， n是观察的总数——是分布的众数周围的累积分布函数的无偏估计量。因此，这个估计量经常被用作分位图。

在数论中，耿贝尔分布近似于随机整数分拆的项数^[10]以及最大素数间隙和素数星座之间的最大间隙的趋势调整大小。 ^[11]

Gumbel 重参数化技巧

在机器学习中，耿贝尔分布有时用于从分类分布中生成样本。这种技术称为“Gumbel-max技巧”，是“重参数化技巧”的一个特例。 ^[12]

具体而言，令 $(\pi _{1},...,\pi _{n})$ 非负且不全为零，并且让 $g_{1},...,g_{n}$ 是Gumbel(0, 1)的独立样本，则 $Pr(j=\arg \max _{i}(g_{i}+\log \pi _{i}))={\frac {\pi _{j}}{\sum _{i}\pi _{i}}}$ 因此， $\arg \max _{i}(g_{i}+\log \pi _{i})\sim {\text{Categorical}}\left({\frac {\pi _{j}}{\sum _{i}\pi _{i}}}\right)_{j}$

等价地，给定任何 $x_{1},...,x_{n}\in \mathbb {R}$ ，我们可以从它的玻尔兹曼分布中采样： $Pr(j=\arg \max _{i}(g_{i}+x_{i}))={\frac {e^{x_{j}}}{\sum _{i}e^{x_{i}}}}$ 相关等式包括： ^[13]

如果 $x\sim Exp(\lambda )$ ，那么 $(-\ln x-\gamma )\sim {\text{Gumbel}}(-\gamma +\ln \lambda ,1)$ 。
$\arg \max _{i}(g_{i}+\log \pi _{i})\sim {\text{Categorical}}\left({\frac {\pi _{j}}{\sum _{i}\pi _{i}}}\right)_{j}$ 。
$\max _{i}(g_{i}+\log \pi _{i})\sim {\text{Gumbel}}\left(-\gamma +\log \left(\sum _{i}\pi _{i}\right),1\right)$ 。也就是说，Gumbel 分布是一个最大稳定分布族。
$\mathbb {E} [\max _{i}(g_{i}+\beta x_{i})]=\log \left(\sum _{i}e^{\beta x_{i}}\right)$ 。

随机变量生成

耿贝尔分布的分位数函数（逆累积分布函数） $Q(p)$ 可由下式给出

Q(p)=\mu -\beta \ln(-\ln(p)),

其中 $\mu$ 和 $\beta$ 是参数，当随机变量 $U$ 是从 $(0,1)$ 上的均匀分布中抽取时，变量 $Q(U)$ 具有服从耿贝尔分布。

概率纸

在软件时代之前，人们使用概率纸描绘耿贝尔分布（见插图）。这种纸基于累积分布函数的 $F$ 的线性化：

-\ln[-\ln(F)]=(x-\mu )/\beta

在纸上，水平轴以双对数刻度构建。垂直轴是线性的。通过在纸张的水平轴上寻找 $F$ ，在垂直轴上寻找 $x$ ，耿贝尔分布由斜率为 $1/\beta$ 的直线表示。当像CumFreq这样的分布拟合软件可用时，绘制分布的任务变得更加容易。

参见

参考资料

^ Gumbel, E.J., Les valeurs extrêmes des distributions statistiques (PDF), Annales de l'Institut Henri Poincaré, 1935, 5 (2): 115–158 [2023-01-21], （原始内容存档 (PDF)于2018-03-10）
^ Gumbel E.J. (1941). "The return period of flood flows". The Annals of Mathematical Statistics, 12, 163–190.
^ ^3.0 ^3.1 Oosterbaan, R.J. http://www.waterlog.info/pdf/freqtxt.pdf |chapterurl=缺少标题 (帮助) (PDF). Ritzema, H.P. (编). Drainage Principles and Applications, Publication 16. Wageningen, The Netherlands: International Institute for Land Reclamation and Improvement (ILRI). 1994: 175–224. ISBN 90-70754-33-9.
^ Willemse, W.J.; Kaas, R. Rational reconstruction of frailty-based mortality models by a generalisation of Gompertz' law of mortality (PDF). Insurance: Mathematics and Economics. 2007, 40 (3): 468 [2023-01-21]. doi:10.1016/j.insmatheco.2006.07.003. （原始内容 (PDF)存档于2017-08-09）.
^ Marques, F.; Coelho, C.; de Carvalho, M. On the distribution of linear combinations of independent Gumbel random variables (PDF). Statistics and Computing. 2015, 25: 683‒701 [2023-01-21]. doi:10.1007/s11222-014-9453-5. （原始内容存档 (PDF)于2022-12-20）.
^ CumFreq, software for probability distribution fitting
^ user49229, Gumbel distribution and exponential distribution. [2023-01-21]. （原始内容存档于2021-08-26）.
^ Gumbel, E.J. Statistical theory of extreme values and some practical applications. Applied Mathematics Series 33 1st. U.S. Department of Commerce, National Bureau of Standards. 1954 [2023-01-21]. ASIN B0007DSHG4. （原始内容存档于2023-01-21）.
^ Burke, Eleanor J.; Perry, Richard H.J.; Brown, Simon J. An extreme value analysis of UK drought and projections of change in the future. Journal of Hydrology. 2010, 388 (1–2): 131–143. Bibcode:2010JHyd..388..131B. doi:10.1016/j.jhydrol.2010.04.035.
^ Erdös, Paul; Lehner, Joseph. The distribution of the number of summands in the partitions of a positive integer. Duke Mathematical Journal. 1941, 8 (2): 335. doi:10.1215/S0012-7094-41-00826-8.
^ Kourbatov, A. Maximal gaps between prime k-tuples: a statistical approach. Journal of Integer Sequences. 2013, 16. Bibcode:2013arXiv1301.2242K. arXiv:1301.2242 . Article 13.5.2.
^ Jang, Eric; Gu, Shixiang; Poole, Ben. Categorical Reparametrization with Gumble-Softmax. International Conference on Learning Representations (ICLR) 2017. April 2017 [2023-01-21]. （原始内容存档于2023-01-21）.
^ Balog, Matej; Tripuraneni, Nilesh; Ghahramani, Zoubin; Weller, Adrian. Lost Relatives of the Gumbel Trick. International Conference on Machine Learning (PMLR). 2017-07-17: 371–379 [2023-01-21]. （原始内容存档于2023-01-21）（英语）.

外部链接

[1] Gumbel, E.J., Les valeurs extrêmes des distributions statistiques (PDF), Annales de l'Institut Henri Poincaré, 1935, 5 (2): 115–158 [2023-01-21], （原始内容存档 (PDF)于2018-03-10）

[2] Gumbel E.J. (1941). "The return period of flood flows". The Annals of Mathematical Statistics, 12, 163–190.

[Oosterbaan-3] 3.0 ^3.1 Oosterbaan, R.J. http://www.waterlog.info/pdf/freqtxt.pdf |chapterurl=缺少标题 (帮助) (PDF). Ritzema, H.P. (编). Drainage Principles and Applications, Publication 16. Wageningen, The Netherlands: International Institute for Land Reclamation and Improvement (ILRI). 1994: 175–224. ISBN 90-70754-33-9.

[4] Willemse, W.J.; Kaas, R. Rational reconstruction of frailty-based mortality models by a generalisation of Gompertz' law of mortality (PDF). Insurance: Mathematics and Economics. 2007, 40 (3): 468 [2023-01-21]. doi:10.1016/j.insmatheco.2006.07.003. （原始内容 (PDF)存档于2017-08-09）.

[Marques-5] Marques, F.; Coelho, C.; de Carvalho, M. On the distribution of linear combinations of independent Gumbel random variables (PDF). Statistics and Computing. 2015, 25: 683‒701 [2023-01-21]. doi:10.1007/s11222-014-9453-5. （原始内容存档 (PDF)于2022-12-20）.

[6] CumFreq, software for probability distribution fitting

[7] user49229, Gumbel distribution and exponential distribution. [2023-01-21]. （原始内容存档于2021-08-26）.

[8] Gumbel, E.J. Statistical theory of extreme values and some practical applications. Applied Mathematics Series 33 1st. U.S. Department of Commerce, National Bureau of Standards. 1954 [2023-01-21]. ASIN B0007DSHG4. （原始内容存档于2023-01-21）.

[9] Burke, Eleanor J.; Perry, Richard H.J.; Brown, Simon J. An extreme value analysis of UK drought and projections of change in the future. Journal of Hydrology. 2010, 388 (1–2): 131–143. Bibcode:2010JHyd..388..131B. doi:10.1016/j.jhydrol.2010.04.035.

[10] Erdös, Paul; Lehner, Joseph. The distribution of the number of summands in the partitions of a positive integer. Duke Mathematical Journal. 1941, 8 (2): 335. doi:10.1215/S0012-7094-41-00826-8.

[11] Kourbatov, A. Maximal gaps between prime k-tuples: a statistical approach. Journal of Integer Sequences. 2013, 16. Bibcode:2013arXiv1301.2242K. arXiv:1301.2242 . Article 13.5.2.

[12] Jang, Eric; Gu, Shixiang; Poole, Ben. Categorical Reparametrization with Gumble-Softmax. International Conference on Learning Representations (ICLR) 2017. April 2017 [2023-01-21]. （原始内容存档于2023-01-21）.

[13] Balog, Matej; Tripuraneni, Nilesh; Ghahramani, Zoubin; Weller, Adrian. Lost Relatives of the Gumbel Trick. International Conference on Machine Learning (PMLR). 2017-07-17: 371–379 [2023-01-21]. （原始内容存档于2023-01-21）（英语）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]