{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Probability Distribution\n",
"\n",
"확률 분포"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
""
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"python ver=3.8.9 (default, Jun 27 2021, 02:41:12) \n",
"[GCC 7.5.0]\n",
"pandas ver=1.2.5\n",
"numpy ver=1.19.5\n",
"scipy ver=1.6.3\n"
]
}
],
"source": [
"# 경고 메시지 출력 끄기\n",
"import warnings \n",
"warnings.filterwarnings(action='ignore')\n",
"\n",
"# 노트북 셀 표시를 브라우저 전체 폭 사용하기\n",
"from IPython.core.display import display, HTML\n",
"display(HTML(\"\"))\n",
"from IPython.display import clear_output\n",
"\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import os, sys, shutil, functools\n",
"import collections, pathlib, re, string\n",
"\n",
"rseed = 22\n",
"import random\n",
"random.seed(rseed)\n",
"\n",
"import numpy as np\n",
"np.random.seed(rseed)\n",
"np.set_printoptions(precision=5)\n",
"np.set_printoptions(formatter={'float_kind': \"{:.5f}\".format})\n",
"\n",
"import pandas as pd\n",
"pd.set_option('display.max_rows', None) \n",
"pd.set_option('display.max_columns', None) \n",
"pd.set_option('display.max_colwidth', None)\n",
"pd.options.display.float_format = '{:,.5f}'.format\n",
"\n",
"import scipy as sp\n",
"\n",
"import seaborn as sns\n",
"\n",
"from pydataset import data\n",
"\n",
"print(f\"python ver={sys.version}\")\n",
"print(f\"pandas ver={pd.__version__}\")\n",
"print(f\"numpy ver={np.__version__}\")\n",
"print(f\"scipy ver={sp.__version__}\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Int64Index: 150 entries, 1 to 150\n",
"Data columns (total 5 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Sepal.Length 150 non-null float64\n",
" 1 Sepal.Width 150 non-null float64\n",
" 2 Petal.Length 150 non-null float64\n",
" 3 Petal.Width 150 non-null float64\n",
" 4 Species 150 non-null object \n",
"dtypes: float64(4), object(1)\n",
"memory usage: 7.0+ KB\n"
]
}
],
"source": [
"# Iris 데이터 셋의 컬럼 정보 살피기\n",
"df_iris = data('iris')\n",
"df_iris.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 확률과 확률 분포 (Probability and Probability Distribution)\n",
"\n",
"주사위를 실제 던져보기 전까지는 주사위의 어떤 수가 나오게 될지 알수 없지만 주사위를 던졌을 때 각 수가 나오는 가능성은 1/6 으로 예측 할 수 있습니다. 추측 통계에서는 결과 예측시 확률과 확률의 분포를 이용합니다.\n",
"\n",
"* 사상: 시행 (실험, 관측 등)에 의해 생긴 결과 (주사위 예에서는 던져서 나온 수)\n",
"* 확률: 사상이 어느정도 일어나기 쉬운지를 수치화, 모든 사상의 합은 1 (주사위 예에서는 각 눈의 확률)\n",
"* 확률변수: 시행해 봐야 결과를 알 수 있는 변수를 확률 변수라 정의\n",
" * 이산확률변수: 변수가 취할 수 있는 값이 이산형인 확률 변수 (예, 주사위, 동전 등)\n",
" * 연속확률변수: 변수가 취할 수 있는 값이 연속형인 활률 변수 (예, 키, 몸무게 등)\n",
"* 확률분포: 확률 변수(전체 합 = 1)가 취할 수 있는 값이 어떻게 분포하고 있는지 나타냄\n",
"\n",
"\n",
"\n",
"https://tinyheero.github.io/2016/03/17/prob-distr.html\n",
"\n",
"**Further Reading**\n",
"* [Probability Distributions in Python Tutorial](https://www.datacamp.com/community/tutorials/probability-distributions-python)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**도수분포표 (Frequency Distribution Table)**\n",
"\n",
"도수분포표는 데이터를 계급으로 나누고 계급에 대한 빈도를 세어 표시하는 테이블입니다. 범주형 변수의 경우는 간단하게 구할 수 있지만, 연속형 변수의 경우에는 어떻게 계급을 나누어 표시해야 하는지에 대한 어려운 부분이 존재합니다. \n",
"\n",
"연속형 변수의 적장한 계급 수를 구하는 방법은 주로 Sturge’s rule 또는 Freedman-Diaconis rule 을 사용합니다.\n",
"\n",
"**Sturge's rule**\n",
"\n",
"\n",
"\n",
"**Freedman-Diaconis rule**\n",
"\n",
"\n",
"\n",
"https://medium.datadriveninvestor.com/how-to-decide-on-the-number-of-bins-of-a-histogram-3c36dc5b1cd8\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" col_0 \n",
" frequency \n",
" cumulative_frequency \n",
" relative_frequency \n",
" \n",
" \n",
" Species \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" setosa \n",
" 50 \n",
" 50 \n",
" 0.33333 \n",
" \n",
" \n",
" versicolor \n",
" 50 \n",
" 100 \n",
" 0.33333 \n",
" \n",
" \n",
" virginica \n",
" 50 \n",
" 150 \n",
" 0.33333 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"col_0 frequency cumulative_frequency relative_frequency\n",
"Species \n",
"setosa 50 50 0.33333\n",
"versicolor 50 100 0.33333\n",
"virginica 50 150 0.33333"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_iris = data('iris')\n",
"\n",
"# 범주형 변수의 도수 분포표\n",
"fdt = pd.crosstab(index=df_iris['Species'], columns='frequency')\n",
"fdt['cumulative_frequency'] = fdt['frequency'].cumsum() # 누적도수\n",
"fdt['relative_frequency'] = fdt['frequency'] / fdt['frequency'].sum() # 상대도수\n",
"fdt"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" col_0 \n",
" frequency \n",
" cumulative_frequency \n",
" relative_frequency \n",
" cumulative_relative_frequency \n",
" \n",
" \n",
" Sepal.Length \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" [4.3, 4.7) \n",
" 9 \n",
" 9 \n",
" 0.06000 \n",
" 0.06000 \n",
" \n",
" \n",
" [4.7, 5.1) \n",
" 23 \n",
" 32 \n",
" 0.15333 \n",
" 0.21333 \n",
" \n",
" \n",
" [5.1, 5.5) \n",
" 20 \n",
" 52 \n",
" 0.13333 \n",
" 0.34667 \n",
" \n",
" \n",
" [5.5, 5.9) \n",
" 28 \n",
" 80 \n",
" 0.18667 \n",
" 0.53333 \n",
" \n",
" \n",
" [5.9, 6.3) \n",
" 28 \n",
" 108 \n",
" 0.18667 \n",
" 0.72000 \n",
" \n",
" \n",
" [6.3, 6.7) \n",
" 14 \n",
" 122 \n",
" 0.09333 \n",
" 0.81333 \n",
" \n",
" \n",
" [6.7, 7.1) \n",
" 17 \n",
" 139 \n",
" 0.11333 \n",
" 0.92667 \n",
" \n",
" \n",
" [7.1, 7.5) \n",
" 5 \n",
" 144 \n",
" 0.03333 \n",
" 0.96000 \n",
" \n",
" \n",
" [7.5, 7.904) \n",
" 6 \n",
" 150 \n",
" 0.04000 \n",
" 1.00000 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"col_0 frequency cumulative_frequency relative_frequency \\\n",
"Sepal.Length \n",
"[4.3, 4.7) 9 9 0.06000 \n",
"[4.7, 5.1) 23 32 0.15333 \n",
"[5.1, 5.5) 20 52 0.13333 \n",
"[5.5, 5.9) 28 80 0.18667 \n",
"[5.9, 6.3) 28 108 0.18667 \n",
"[6.3, 6.7) 14 122 0.09333 \n",
"[6.7, 7.1) 17 139 0.11333 \n",
"[7.1, 7.5) 5 144 0.03333 \n",
"[7.5, 7.904) 6 150 0.04000 \n",
"\n",
"col_0 cumulative_relative_frequency \n",
"Sepal.Length \n",
"[4.3, 4.7) 0.06000 \n",
"[4.7, 5.1) 0.21333 \n",
"[5.1, 5.5) 0.34667 \n",
"[5.5, 5.9) 0.53333 \n",
"[5.9, 6.3) 0.72000 \n",
"[6.3, 6.7) 0.81333 \n",
"[6.7, 7.1) 0.92667 \n",
"[7.1, 7.5) 0.96000 \n",
"[7.5, 7.904) 1.00000 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 연속형 변수의 도수 분포표\n",
"# Sturges's Rule 에 따른 Binning \n",
"n_data = df_iris['Sepal.Length'].count()\n",
"n_bins = int(np.ceil(np.log2(n_data) + 1))\n",
"bins = pd.cut(x=df_iris['Sepal.Length'], bins=n_bins, right=False)\n",
"fdt = pd.crosstab(index=bins, columns='frequency')\n",
"fdt['cumulative_frequency'] = fdt['frequency'].cumsum() # 누적도수\n",
"fdt['relative_frequency'] = fdt['frequency'] / fdt['frequency'].sum() # 상대도수\n",
"fdt['cumulative_relative_frequency'] = fdt['relative_frequency'].cumsum() # 누적상대도수\n",
"fdt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**히스토그램 (Histogram)**\n",
"\n",
"도수분포표를 통해 구한 계급 구간을 이용하여 히스토그램으로 데이터를 시각화 하여 살펴 봅니다. 히스토그램을 통한 데이터 시각화를 하면 좀 더 데이터 분포에 대하여 직관적으로 살펴볼 수 있습니다."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAD4CAYAAADxeG0DAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAOHklEQVR4nO3dbYxc5X2G8euu7TQJQYHULtlibKMIIVGpAbqipEQRLUlEqQWJiiqQSiEKMm1DG9pIheZDG/VTkPJS9UWkDtDQFmhSAo1rkRREI6FILaohLhicCEpszNbGJqhAX8Hm3w97TJbN7s54d2ZnHvv6SaM9c86zM7cfWfeeffbMTKoKSVK7fmTUASRJS2ORS1LjLHJJapxFLkmNs8glqXErl/PJVq9eXRs2bFjOp5Sk5j388MPPV9Wa+Y4va5Fv2LCBbdu2LedTSlLzkuxe6LhLK5LUOItckhpnkUtS4yxySWqcRS5JjbPIJalxFrkkNc4il6TGWeSS1LhlfWWnjj4Ta9exb2rPSDOseNObOfTK/440A8A7Tz6Fvc8+M+oYOgZZ5FqSfVN7WH/91pFm2H3jxpFnOJxDGgWXViSpcRa5JDXOIpekxlnkktQ4i1ySGmeRS1LjLHJJapxFLkmNs8glqXEWuSQ1ziKXpMZZ5JLUOItckhpnkUtS4yxySWqcRS5JjbPIJalxPYs8ySlJvpnkiSSPJ/l4t/9TSaaSbO9uFw0/riRptn4+6u0g8ImqeiTJ8cDDSe7vjn2+qj4zvHiSpF56FnlV7QX2dtsvJ9kJnDzsYJKk/hzRGnmSDcBZwEPdrmuTPJrk1iQnzvM9m5JsS7LtwIEDS0srSfohfRd5krcBXwWuq6qXgJuAdwFnMn3G/tm5vq+qNlfVZFVNrlmzZumJJUlv0FeRJ1nFdInfXlV3A1TVc1V1qKpeA74InDO8mJKk+fRz1UqAW4CdVfW5GfsnZgz7MLBj8PEkSb30c9XKecAVwGNJtnf7PglcnuRMoIBdwDVDyCdJ6qGfq1a+BWSOQ/cOPo4k6Uj5yk5JapxFLkmNs8glqXEWuSQ1ziKXpMZZ5JLUOItckhpnkUtS4yxySWqcRX6EJtauI8nIbxNr1416KiSNiX7ea0Uz7Jvaw/rrt446Brtv3DjqCJLGhGfkktQ4i1ySGmeRS1LjLHJJapxFLkmNs8glqXEWuSQ1ziKXpMZZ5JLUOItckhpnkUtS4yxySWqcRS5JjbPIJalxFrkkNc4il6TGWeSS1LieRZ7klCTfTPJEkseTfLzb/44k9yd5svt64vDjSpJm6+eM/CDwiao6AzgX+FiSM4AbgAeq6jTgge6+JGmZ9SzyqtpbVY902y8DO4GTgUuA27phtwEfGlJGSdICjujDl5NsAM4CHgJOqqq93aF9wEnzfM8mYBPAunV+8vvArFhFklGnkDQG+i7yJG8DvgpcV1UvzSyRqqokNdf3VdVmYDPA5OTknGO0CIdeZf31W0edgt03bhx1BOmY19dVK0lWMV3it1fV3d3u55JMdMcngP3DiShJWkg/V60EuAXYWVWfm3FoC3Blt30l8LXBx5Mk9dLP0sp5wBXAY0m2d/s+CXwa+EqSjwK7gV8eSkJJ0oJ6FnlVfQuY769qFww2jiTpSPnKTklqnEUuSY2zyCWpcRa5JDXOIpekxlnkktQ4i1ySGmeRS1LjLHJJapxFLkmNs8glqXEWuSQ1ziKXpMZZ5JLUOItckhpnkUtS4yxy6SgzsXYdSUZ+m1i7btRTcczo56PeJDVk39Qe1l+/ddQx2H3jxlFHOGZ4Ri5JjbPIJalxFrkkNc4il6TGWeSS1DiLXJIaZ5FLUuMscklqnEUuSY2zyCWpcT2LPMmtSfYn2TFj36eSTCXZ3t0uGm5MSdJ8+jkj/xJw4Rz7P19VZ3a3ewcbS5LUr55FXlUPAi8sQxZJ0iIsZY382iSPdksvJ843KMmmJNuSbDtw4MASnk6SNJfFFvlNwLuAM4G9wGfnG1hVm6tqsqom16xZs8inkyTNZ1FFXlXPVdWhqnoN+CJwzmBjSZL6tagiTzIx4+6HgR3zjZUkDVfPTwhKcidwPrA6ybPAHwDnJzkTKGAXcM3wIkqSFtKzyKvq8jl23zKELJKkRfCVnZLUOD98WRqUFatIMuoUOgZZ5NKgHHrVT6/XSLi0IkmNs8glqXEWuSQ1ziKXpMZZ5JLUOItckhpnkUtS4yxySWqcRS5JjbPIJalxFrkkNc4il6TGWeSS1DiLXJIaZ5FLUuMscklqnEUuSY2zyCWpcRa5JDXOIpekxlnkktQ4i1ySGmeRS1LjLHJJapxFLkmN61nkSW5Nsj/Jjhn73pHk/iRPdl9PHG5MSdJ8+jkj/xJw4ax9NwAPVNVpwAPdfUnSCPQs8qp6EHhh1u5LgNu67duADw02liSpX4tdIz+pqvZ22/uAk+YbmGRTkm1Jth04cGCRTydJms+S/9hZVQXUAsc3V9VkVU2uWbNmqU8nSZplsUX+XJIJgO7r/sFFkiQdicUW+Rbgym77SuBrg4kjSTpS/Vx+eCfwT8DpSZ5N8lHg08AHkjwJvL+7L0kagZW9BlTV5fMcumDAWSRJi+ArOyWpcT3PyMfFxNp17JvaM+oYkjR2minyfVN7WH/91lHHYPeNG0cdQZLewKUVSWqcRS5JjbPIJalxFrkkNc4il6TGWeSS1DiLXJIaZ5FLUuMscklqnEUuSY2zyCWpcRa5JDXOIpekxlnkktQ4i1ySGmeRS1LjLHJJapxFLkmNs8glqXEWuSQ1ziKXdFSbWLuOJCO/TaxdN7R/48qhPbIkjYF9U3tYf/3WUcdg940bh/bYnpFLUuMscklqnEUuSY1b0hp5kl3Ay8Ah4GBVTQ4ilCSpf4P4Y+fPVdXzA3gcSdIiuLQiSY1b6hl5AfclKeDPq2rz7AFJNgGbANatG951lJLGzIpVJBl1imPCUov8vVU1leTHgfuTfKeqHpw5oCv3zQCTk5O1xOeT1IpDrx7112+PiyUtrVTVVPd1P3APcM4gQkmS+rfoIk9yXJLjD28DHwR2DCqYJKk/S1laOQm4p1sDWwncUVXfGEgqSVLfFl3kVfU08O4BZpEkLYKXH0pS4yxySWqcRS5JjbPIJalxFrkkNc4il6TGWeSS1DiLXJIaZ5FLUuMscklqnEUuSY2zyCWpcRa5JDXOIpekxlnkktQ4i1ySGmeRS1LjLHJJapxFLkmNs8glqXEWuSQ1ziKXpMZZ5JLUOItckhpnkUtS4yxySWqcRS5JjbPIJalxSyryJBcm+W6Sp5LcMKhQkqT+LbrIk6wA/gz4BeAM4PIkZwwqmCSpP0s5Iz8HeKqqnq6qV4C/AS4ZTCxJUr9SVYv7xuRS4MKqurq7fwXwM1V17axxm4BN3d3Tge8uPu6SrQaeH+Hz96uVnNBOVnMOVis5oZ2sC+VcX1Vr5vvGlcPJ8wNVtRnYPOzn6UeSbVU1OeocvbSSE9rJas7BaiUntJN1KTmXsrQyBZwy4/7abp8kaRktpcj/BTgtyalJ3gRcBmwZTCxJUr8WvbRSVQeTXAv8A7ACuLWqHh9YsuEYiyWePrSSE9rJas7BaiUntJN10TkX/cdOSdJ48JWdktQ4i1ySGnfUFnmSFUm+nWTrHMeuSnIgyfbudvWIMu5K8liXYdscx5Pkj7u3QHg0ydljmvP8JC/OmM/fH0XOLssJSe5K8p0kO5O8Z9bxcZnTXjlHPqdJTp/x/NuTvJTkulljxmU++8k68jntcvx2kseT7EhyZ5I3zzr+o0m+3M3pQ0k29HzQqjoqb8DvAHcAW+c4dhXwp2OQcReweoHjFwFfBwKcCzw0pjnPn2ueR5T1NuDqbvtNwAljOqe9co7NnHZ5VgD7mH5hytjNZ59ZRz6nwMnA94C3dPe/Alw1a8xvAF/oti8DvtzrcY/KM/Ika4FfBG4edZYlugT4y5r2z8AJSSZGHWpcJXk78D7gFoCqeqWq/mPWsJHPaZ85x80FwL9V1e5Z+0c+n3OYL+u4WAm8JclK4K3Av886fgnTP+gB7gIuSJKFHvCoLHLgj4DfBV5bYMwvdb8K3pXklAXGDVMB9yV5uHsrg9lOBvbMuP9st2+59coJ8J4k/5rk60l+cjnDzXAqcAD4i25Z7eYkx80aMw5z2k9OGI85Pewy4M459o/DfM42X1YY8ZxW1RTwGeAZYC/wYlXdN2vY63NaVQeBF4EfW+hxj7oiT7IR2F9VDy8w7O+BDVX1U8D9/OCn33J7b1WdzfQ7SH4syftGlKOXXjkfYfrX2HcDfwL83TLnO2wlcDZwU1WdBfwXMI5vr9xPznGZUzL9gr+Lgb8dVYZ+9cg68jlNciLTZ9ynAj8BHJfkV5b6uEddkQPnARcn2cX0OzL+fJK/njmgqr5fVf/X3b0Z+Onljfh6jqnu637gHqbfUXKmsXgbhF45q+qlqvrPbvteYFWS1cudk+mzwWer6qHu/l1MF+ZM4zCnPXOO0ZzC9A/wR6rquTmOjcN8zjRv1jGZ0/cD36uqA1X1KnA38LOzxrw+p93yy9uB7y/0oEddkVfV71XV2qrawPSvWP9YVW/4iTdrDe9iYOcyRjyc4bgkxx/eBj4I7Jg1bAvwq92VAecy/WvY3nHLmeSdh9fwkpzD9P+rBf/jDUNV7QP2JDm923UB8MSsYSOf035yjsucdi5n/qWKkc/nLPNmHZM5fQY4N8lbuywX8MP9swW4stu+lOkOW/CVm0N/98NxkeQPgW1VtQX4rSQXAweBF5i+imW5nQTc0/2/WgncUVXfSPJrAFX1BeBepq8KeAr4b+AjY5rzUuDXkxwE/ge4rNd/vCH6TeD27lfsp4GPjOGc9pNzLOa0++H9AeCaGfvGcT77yTryOa2qh5LcxfQyz0Hg28DmWf10C/BXSZ5iup8u6/W4vkRfkhp31C2tSNKxxiKXpMZZ5JLUOItckhpnkUtS4yxySWqcRS5Jjft/pOud6c8up0cAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Pandas를 이용한 시각화\n",
"df_iris['Sepal.Length'].hist(bins=n_bins, grid=False, edgecolor='black')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'bins = 9')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEWCAYAAABv+EDhAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUgUlEQVR4nO3df7RdZX3n8fcHAiqCRUsmgzFpqKWsidSiRKoRqxbboR0KdcoysByLHTXpoK1MqbMoztSusZ3Rqf1hrSJRqHSKNJbCFBxHQQv+GBymAVN+M1IKJgQhqIC2Vgl854+zgze398e5Iefsc3ner7XOuvvs/Zy9v3nWzfnc/ex9npOqQpLUnn36LkCS1A8DQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAntSS3JXkVbNse1mS28ddkzQpDAA1q6o+X1VH9F3HLkl+NslNSb6V5Jokq/uuSU9uBoA0AZIcDlwI/BJwMHA5cFmSJX3WpSc3A0AteFGSW5J8I8kfJ3kqQJJXJNm2q1E3XPRrSW5I8lCSTVPaHpLk40keTPL1JJ9Psjf///xL4PNV9YWq2gm8G1gOvHwvHkPajQGgFryWwRvsc4EfBv7jHG1fAxwPHAY8H3h9t/5MYBuwFFgGnA3MOI9KFyAPzvL4wBzHzrTlAEfO94+T9pQBoBb8UVVtraqvA78NnDpH2z+squ1d28uBo7r1jwCHAj9QVY901w9mDICqen5VHTzL4/RZjvtp4OXdWcn+DAJmf+CAhf9zpeEYAGrB1inLdwPPnqPtV6cs/wNwYLf8O8AdwBVJ7kxy1t4ssKpuA04D/gi4FzgEuIXBWYc0EgaAWrBiyvJKYPtCd1BV36yqM6vqB4ETgV9NctxMbZPc3N3JM9Pjg3Mc4+KqOrKqvh94B7AK+OuF1ioNyzsM1II3J/k4g7/o3w5sWugOkpwA3Ab8LfAQ8Cjw2Extq+p5e1JkkqOBLcCzgPcDl3VnBtJIeAagFnwUuAK4k8Eb+G/twT4OZzBO/y3gi8AHquqqvVbhwHuBB4HbgW8Ab9rL+5d2E78QRpLa5BmAJDXKAJCkRhkAktQoA0CSGrUobgM95JBDatWqVX2XIUmLynXXXfdAVS2dbfuiCIBVq1axefPmvsuQpEUlyd1zbXcISJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGrUoPgmsJ5/lK1ayfdvW+RuO0L77PYVHH/lOrzUAPPs5K7hn61f6LkMNMgDUi+3btrLu3Gt6rWHThrW917CrDqkPDgFJUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAEhSo0YWAElWJLkqyS1Jbk7y1m79bya5J8mW7vEzo6pBkjS7UX4l5E7gzKq6PslBwHVJruy2/X5VvWeEx5YkzWNkAVBV9wL3dsvfTHIrsHxUx5MkLcxYrgEkWQW8ALi2W/WWJDckOT/JM2d5zfokm5Ns3rFjxzjKlKSmjDwAkhwI/AVwRlU9DJwDPBc4isEZwu/O9Lqq2lhVa6pqzdKlS0ddpiQ1Z6QBkGQ/Bm/+F1bVJQBVdV9VPVpVjwEfAo4ZZQ2SpJmN8i6gAOcBt1bV701Zf+iUZq8GbhpVDZKk2Y3yLqCXAq8DbkyypVt3NnBqkqOAAu4CNoywBknSLEZ5F9AXgMyw6ROjOqYkaXh+EliSGmUASFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAIzJ8hUrSdL7Y/mKlX13haQJMcq5gDTF9m1bWXfuNX2XwaYNa/suQdKE8AxAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGjWyAEiyIslVSW5JcnOSt3brn5XkyiRf7n4+c1Q1SJJmN8ozgJ3AmVW1Gngx8OYkq4GzgM9U1eHAZ7rnkqQxG1kAVNW9VXV9t/xN4FZgOXAScEHX7ALg50ZVgyRpdmP5Uvgkq4AXANcCy6rq3m7TV4Fls7xmPbAeYOXKlWOoshH7LCFJ31VImgAjD4AkBwJ/AZxRVQ9PffOpqkpSM72uqjYCGwHWrFkzYxvtgcd2su7ca/qugk0b1vZdgtS8kd4FlGQ/Bm/+F1bVJd3q+5Ic2m0/FLh/lDVIkmY2yruAApwH3FpVvzdl02XAad3yacBfjqoGSdLsRjkE9FLgdcCNSbZ0684G3gV8LMkbgLuB14ywBknSLEYWAFX1BWC2q43Hjeq4kqTh+ElgSWqUASBJjTIAJKlRBoAkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJACxfsZIkvT+Wr1jZd1c0Y5RfCSlpEdm+bSvrzr2m7zLYtGFt3yU0wzMASWqUASBJjTIAJKlRQwVAkpcOs06StHgMewbwviHXSZIWiTnvAkryEmAtsDTJr07Z9Axg31EWJkkarfluA90fOLBrd9CU9Q8DJ4+qKEnS6M0ZAFX1WeCzST5SVXePqSZJ0hgM+0GwpyTZCKya+pqq+olRFCVJGr1hA+DPgQ8CHwYeHV05kqRxGTYAdlbVOQvZcZLzgROA+6vqyG7dbwJvAnZ0zc6uqk8sZL+SpL1j2NtAL09yepJDkzxr12Oe13wEOH6G9b9fVUd1D9/8Jaknw54BnNb9fNuUdQX84GwvqKrPJVm1h3VJkkZsqACoqsP24jHfkuQXgM3AmVX1jZkaJVkPrAdYudLpYSVpbxsqALo37H+iqv5kgcc7B3gng7OHdwK/C/zbWfa9EdgIsGbNmlrgcSRJ8xh2COhFU5afChwHXA8sKACq6r5dy0k+BHx8Ia+XJO09ww4B/fLU50kOBv5soQdLcmhV3ds9fTVw00L3IUnaO/b0G8H+HpjzukCSi4BXAIck2Qa8A3hFkqMYDAHdBWzYw+NLkp6gYa8BXM7gTRsGk8D9C+Bjc72mqk6dYfV5C6pOkjQyw54BvGfK8k7g7qraNoJ6JEljMuw1gM8mWcb3LgZ/eXQlSY3ZZwlJ+q5CDRp2COg1wO8AVwMB3pfkbVV18Qhrk9rw2E7WnXtN31WwacPavkvQmA07BPR24EVVdT9AkqXApwEDQJIWqWHnAtpn15t/52sLeK0kaQINewbwySSfAi7qnq8DnMhNkhax+b4T+IeAZVX1tiT/Gji22/RF4MJRFydJGp35zgD+APh1gKq6BLgEIMmPdNt+doS1SZJGaL5x/GVVdeP0ld26VSOpSJI0FvMFwMFzbHvaXqxDkjRm8wXA5iRvmr4yyRuB60ZTkiRpHOa7BnAGcGmS1/K9N/w1wP4MZvOUJC1ScwZAN3//2iSvBI7sVv/PqvqrkVcmSRqpYecCugq4asS1SJLGyE/zSlKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNcoAkKRGjSwAkpyf5P4kN01Z96wkVyb5cvfzmaM6viRpbqM8A/gIcPy0dWcBn6mqw4HPdM8lST0YWQBU1eeAr09bfRJwQbd8AfBzozq+JGlu474GsKyq7u2Wvwosm61hkvVJNifZvGPHjvFUJ0kN6e0icFUVUHNs31hVa6pqzdKlS8dYmSS1YdwBcF+SQwG6n/eP+fiSpM64A+Ay4LRu+TTgL8d8fElSZ5S3gV4EfBE4Ism2JG8A3gX8ZJIvA6/qnkuSerBkVDuuqlNn2XTcqI4pSRqenwSWpEaN7AxgUixfsZLt27b2XYYkTZwnfQBs37aVdede03cZbNqwtu8SJGk3DgFJUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkzWD5ipUk6f2xfMXKkf0bl4xsz5K0iG3ftpV1517Tdxls2rB2ZPv2DECSGmUASFKjDABJalQv1wCS3AV8E3gU2FlVa/qoQ5Ja1udF4FdW1QM9Hl+SmuYQkCQ1qq8zgAKuSFLAuVW1cXqDJOuB9QArV47uPlhJE2afJSTpu4om9BUAx1bVPUn+GXBlktuq6nNTG3ShsBFgzZo11UeRknrw2M4n/f33k6KXIaCquqf7eT9wKXBMH3VIUsvGHgBJnp7koF3LwE8BN427DklqXR9DQMuAS7sxviXAR6vqkz3UIUlNG3sAVNWdwI+O+7iSpN15G6gkNcoAkKRGGQCS1CgDQJIaZQBIUqMMAElqlAEgSY0yACSpUQaAJDXKAJCkRhkAktQoA0CSGmUASFKjDABJapQBIEmNMgAkqVEGgCQ1ygCQpEYZAJLUKANAkhplAEhSowwASWqUASBJjTIAJKlRBoAkNcoAkKRGGQCS1CgDQJIa1UsAJDk+ye1J7khyVh81SFLrxh4ASfYF3g/8NLAaODXJ6nHXIUmt6+MM4Bjgjqq6s6q+C/wZcFIPdUhS01JV4z1gcjJwfFW9sXv+OuDHquot09qtB9Z3T48Abh9robs7BHigx+MPa7HUCYunVuvcuxZLnbB4ap2rzh+oqqWzvXDJaOp54qpqI7Cx7zoAkmyuqjV91zGfxVInLJ5arXPvWix1wuKp9YnU2ccQ0D3AiinPn9OtkySNUR8B8NfA4UkOS7I/cApwWQ91SFLTxj4EVFU7k7wF+BSwL3B+Vd087joWaCKGooawWOqExVOrde5di6VOWDy17nGdY78ILEmaDH4SWJIaZQBIUqMMgGmS7JvkS0k+PsO21yfZkWRL93hjTzXeleTGrobNM2xPkj/sptq4IckLJ7TOVyR5aEp//kYfdXa1HJzk4iS3Jbk1yUumbZ+UPp2vzt77NMkRU46/JcnDSc6Y1mZS+nOYWnvv066Of5/k5iQ3JbkoyVOnbX9Kkk1dn16bZNV8+5zYzwH06K3ArcAzZtm+afqH1nryyqqa7cMfPw0c3j1+DDin+9mHueoE+HxVnTC2amb3XuCTVXVyd3faAdO2T0qfzlcn9NynVXU7cBQ8PvXLPcCl05pNRH8OWSv03KdJlgO/Aqyuqm8n+RiDOyg/MqXZG4BvVNUPJTkFeDewbq79egYwRZLnAP8K+HDftTxBJwF/UgP/Bzg4yaF9FzWpknwf8OPAeQBV9d2qenBas977dMg6J81xwN9W1d3T1vfenzOYrdZJsQR4WpIlDIJ/+7TtJwEXdMsXA8clyVw7NAB29wfAfwAem6PNz3enrBcnWTFHu1Eq4Iok13VTZky3HNg65fm2bt24zVcnwEuS/E2S/5XkeeMsborDgB3AH3fDfx9O8vRpbSahT4epEyajT3c5BbhohvWT0J/TzVYr9NynVXUP8B7gK8C9wENVdcW0Zo/3aVXtBB4Cvn+u/RoAnSQnAPdX1XVzNLscWFVVzweu5HtpO27HVtULGZxGvznJj/dUx3zmq/N6BnOV/CjwPuB/jLm+XZYALwTOqaoXAH8PTOI05cPUOSl9SjdEdSLw533VMKx5au29T5M8k8Ff+IcBzwaenuTfPNH9GgDf81LgxCR3MZih9CeS/OnUBlX1tar6Tvf0w8DR4y3x8Tru6X7ez2C88phpTSZiuo356qyqh6vqW93yJ4D9khwy7joZ/PW5raqu7Z5fzOCNdqpJ6NN565ygPoVB8F9fVffNsG0S+nOqWWudkD59FfB3VbWjqh4BLgHWTmvzeJ92w0TfB3xtrp0aAJ2q+vWqek5VrWJwKvhXVbVbwk4bozyRwcXisUry9CQH7VoGfgq4aVqzy4Bf6O60eDGD08V7J63OJP981xhlkmMY/D7O+Qs7ClX1VWBrkiO6VccBt0xr1nufDlPnpPRp51RmH1LpvT+nmbXWCenTrwAvTnJAV8tx/NP3n8uA07rlkxm8h835SV/vAppHkv8MbK6qy4BfSXIisBP4OvD6HkpaBlza/T4uAT5aVZ9M8ksAVfVB4BPAzwB3AP8A/OKE1nky8O+S7AS+DZwy3y/sCP0ycGE3FHAn8IsT2KfD1DkRfdqF/k8CG6asm8T+HKbW3vu0qq5NcjGD4aidwJeAjdPen84D/nuSOxi8P50y336dCkKSGuUQkCQ1ygCQpEYZAJLUKANAkhplAEhSowwALXpJ3t7NknhDBrM17rVJxTKYCXKmmWFXJZn++Yu9KsnZ4zye2mMAaFHLYDrkE4AXdlN0vIrd55hZzM6ev4m05wwALXaHAg/smqKjqh6oqu1Jjk7y2W4iuk/t+hR3kquTvLc7U7ip+2QnSY5J8sVukrVrpnzadkHmOe67k/zfJP8vycu69Qck+ViSW5JcmsE87muSvIvBzI9bklzY7X7fJB/qznauSPK0J9h3apwBoMXuCmBF96b6gSQvT7Ifg0m7Tq6qo4Hzgd+e8poDquoo4PRuG8BtwMu6SdZ+A/gvCy1kiOMuqapjgDOAd3TrTmcwh/tq4D/RzS9VVWcB366qo6rqtV3bw4H3V9XzgAeBn19ojdJUTgWhRa2qvpXkaOBlwCuBTcBvAUcCV3ZTUezLYArdXS7qXvu5JM9IcjBwEHBBksMZTGO93x6Uc8Q8x72k+3kdsKpbPpbBl7xQVTcluWGO/f9dVW2ZYR/SHjEAtOhV1aPA1cDVSW4E3gzcXFUvme0lMzx/J3BVVb06g6/Su3oPSsk8x901k+yj7Nn/ve9MWX4UcAhIT4hDQFrUMvhO18OnrDqKwSyJS7sLxCTZL7t/ice6bv2xDGahfIjB1Lm7piN+/R6Wc/s8x53J/wZe07VfDfzIlG2PdMNK0kgYAFrsDmQwdHNLN3yymsEY/snAu5P8DbCF3edO/8ckXwI+yOB7VAH+G/Bfu/Uz/nXeXZyd+nWhRyTZtuvB4As75jruTD7AIDRuYTB0dTODb3IC2AjcMOUisLRXORuompLkauDXqmpz37XA419Evl9V/WOS5wKfBo6oqu/2XJoa4DUAqV8HAFd1Qz0BTvfNX+PiGYAkNcprAJLUKANAkhplAEhSowwASWqUASBJjfr/6lQTy1MZ/EQAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Seaborn을 이용한 시각화\n",
"sns.histplot(x=df_iris['Sepal.Length'],bins=n_bins)\n",
"plt.title(f'bins = {n_bins}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 이산확률분포\n",
"\n",
"확률변수가 취할 수 있는 값이 이산적인 확률변수\n",
"\n",
"**확률질량함수(Probability mass function)**\n",
"\n",
"확률변수 X 가 취할 수 있는 값에 대한 집합 $X = {x_1, x_2, x_3, ...}$ 일때, 확률변수 X가 $x_k$를 취할 확률\n",
"\n",
"$$f(x) = P(X = x_k) = p_k \\quad (k = 1, 2, 3, ...)$$\n",
"$$f(x) = 0 \\quad (otherwise)$$\n",
"\n",
"**누적분포함수(Cumulative distribution function)**\n",
"\n",
"확률변수 X가 x이하가 될 때의 확률\n",
"\n",
"$$F(x) = P(X \\leq x_k) = \\sum_{x_k \\leq x}f(x_k)$$ \n",
"\n",
"**기대값(Expted value)**\n",
"\n",
"확률변수의 평균\n",
"\n",
"$$E(X) = \\sum_kx_kf(x_k)$$\n",
"\n",
"**분산(Variance)**\n",
"\n",
"$$V(X) = \\sum_k(x_k - \\mu)^2f(x_k)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 이산균등분포 (Discrete Uniform Distribution)\n",
"\n",
"사상이 일어나는 확률이 같은 분포, 확률변수가 x = {1, ..., n} 1,2,3, .. 값은 이산 값을 취할때,\n",
"\n",
"**확률질량함수**\n",
"\n",
"베르누이 분포에서 1이 나오는 확률을 p 라고 할때, \n",
"\n",
"$$f(x) = \\frac{1}{n} \\quad (x \\in {1, 2, ..., n})$$\n",
"$$f(x) = 0 \\quad (otherwise)$$\n",
"\n",
"**기대값과 분산**\n",
"\n",
"$$E(X) = \\frac{n+1}{2} \\quad V(X) = \\frac{n^2 - 1}{12}$$\n",
"\n",
"\n",
"\n",
"https://www.researchgate.net/figure/Uniform-Distribution-Types_fig3_319013233\n",
"\n",
"**Further Reading**\n",
"* [이산균등분포](https://ko.wikipedia.org/wiki/%EC%9D%B4%EC%82%B0%EA%B7%A0%EB%93%B1%EB%B6%84%ED%8F%AC)\n",
"* [Discrete uniform distribution](https://en.wikipedia.org/wiki/Discrete_uniform_distribution)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution '), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAEGCAYAAACkQqisAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVb0lEQVR4nO3dfbRddX3n8fdH4gM4KiCRaoAGLZVBi0rDQ5djfaAiamuYGXRpbY0OmlktttVpp4LTVRwtXdgHqdSqZYThoSgipZJRHCeiaOsaHoKggg+TLBVJBIkGxSoVg9/54/xuOYR7w8kv95yTm/t+rXXX3fu3f3vv79kk+bAfzm+nqpAkqcdDpl2AJGnhMkQkSd0MEUlSN0NEktTNEJEkdVsy7QImbb/99qvly5dPuwxJWjCuv/7671TV0tmWLboQWb58OevWrZt2GZK0YCS5Za5lXs6SJHUzRCRJ3QwRSVI3Q0SS1M0QkSR1M0QkSd0MEUlSN0NEktRtbCGS5NwkdyS5aaht3yRrk6xvv/dp7UlyVpINSb6Q5IihdVa1/uuTrBpq/8UkX2zrnJUk4/oskqTZjfMb6+cB7wIuGGo7Bbiyqs5IckqbfxPwQuCQ9nM08B7g6CT7AqcBK4ACrk+ypqrubH1eB1wDXAEcD3xsjJ+HM274zqztpzxjv3HuVtICtFj+vRhbiFTVZ5Is36Z5JfCcNn0+cBWDEFkJXFCD1yxenWTvJI9vfddW1RaAJGuB45NcBTy6qq5u7RcAJzDmENH0LZa/mNJCMemxs/avqtva9O3A/m16GXDrUL+NrW177RtnaZ9VktXAaoCDDjpoJ8rXDP8xnz7/G2hXMLUBGKuqkkzkBe9VdTZwNsCKFSt8qbw0JtMKNgP1PpM+FpN+Ouvb7TIV7fcdrX0TcOBQvwNa2/baD5ilXZI0QZM+E1kDrALOaL8vH2p/fZKLGdxY/35V3Zbk48CfzjzFBRwHnFpVW5LcleQYBjfWXwX89SQ/yCT5f1m7Nv/7aDEbW4gk+QCDG+P7JdnI4CmrM4BLkpwE3AK8rHW/AngRsAH4EfAagBYWbwOua/3eOnOTHfhtBk+A7cnghro31SVpwsb5dNYr5lh07Cx9Czh5ju2cC5w7S/s64Kk7U6Mkaef4jXVJUjdDRJLUzRCRJHUzRCRJ3QwRSVI3Q0SS1M0QkSR1M0QkSd0MEUlSN0NEktTNEJEkdTNEJEndDBFJUjdDRJLUzRCRJHUzRCRJ3QwRSVI3Q0SS1M0QkSR1M0QkSd0MEUlSN0NEktTNEJEkdTNEJEndDBFJUjdDRJLUzRCRJHUzRCRJ3QwRSVI3Q0SS1M0QkSR1M0QkSd2mEiJJ3pjk5iQ3JflAkkckOTjJNUk2JPlgkoe1vg9v8xva8uVD2zm1tX81yQum8VkkaTGbeIgkWQb8LrCiqp4K7AG8HHg7cGZV/RxwJ3BSW+Uk4M7WfmbrR5LD2npPAY4H3p1kj0l+Fkla7KZ1OWsJsGeSJcBewG3A84BL2/LzgRPa9Mo2T1t+bJK09our6sdV9XVgA3DUZMqXJMEUQqSqNgF/AXyTQXh8H7ge+F5VbW3dNgLL2vQy4Na27tbW/7HD7bOscz9JVidZl2Td5s2b5/cDSdIiNo3LWfswOIs4GHgC8EgGl6PGpqrOrqoVVbVi6dKl49yVJC0q07ic9SvA16tqc1X9BLgMeCawd7u8BXAAsKlNbwIOBGjLHwN8d7h9lnUkSRMwjRD5JnBMkr3avY1jgS8BnwJObH1WAZe36TVtnrb8k1VVrf3l7emtg4FDgGsn9BkkSQxucE9UVV2T5FLgc8BW4AbgbOCjwMVJ/qS1ndNWOQe4MMkGYAuDJ7KoqpuTXMIggLYCJ1fVvRP9MJK0yE08RACq6jTgtG2av8YsT1dV1b8AL51jO6cDp897gZKkkfiNdUlSN0NEktTNEJEkdTNEJEndDBFJUjdDRJLUzRCRJHUzRCRJ3QwRSVI3Q0SS1M0QkSR1M0QkSd0MEUlSN0NEktTNEJEkdTNEJEndDBFJUjdDRJLUzRCRJHUzRCRJ3QwRSVI3Q0SS1M0QkSR1M0QkSd0MEUlSN0NEktRtpBBJ8gvjLkSStPCMeiby7iTXJvntJI8Za0WSpAVjpBCpqmcBrwQOBK5P8v4kzx9rZZKkXd7I90Sqaj3wR8CbgGcDZyX5SpL/MK7iJEm7tlHviRye5Ezgy8DzgF+rqn/bps8cY32SpF3YkhH7/TXwPuDNVXX3TGNVfSvJH42lMknSLm/Uy1kvBt4/EyBJHpJkL4CqunBHd5pk7ySXtsthX07yS0n2TbI2yfr2e5/WN0nOSrIhyReSHDG0nVWt//okq3a0DknSzhk1RD4B7Dk0v1dr6/VO4H9X1aHA0xhcJjsFuLKqDgGubPMALwQOaT+rgfcAJNkXOA04GjgKOG0meCRJkzFqiDyiqv55ZqZN79Wzw/aI8C8D57Rt3VNV3wNWAue3bucDJ7TplcAFNXA1sHeSxwMvANZW1ZaquhNYCxzfU5Mkqc+oIfLDbS4j/SJw93b6b8/BwGbgfya5Icn7kjwS2L+qbmt9bgf2b9PLgFuH1t/Y2uZqf4Akq5OsS7Ju8+bNnWVLkrY1aoi8AfhQkn9M8k/AB4HXd+5zCXAE8J6qegbwQ+67dAVAVRVQndt/gKo6u6pWVNWKpUuXztdmJWnRG+nprKq6LsmhwJNb01er6ied+9wIbKyqa9r8pQxC5NtJHl9Vt7XLVXe05ZsYfMlxxgGtbRPwnG3ar+qsSZLUYUcGYDwSOJzBWcQrkryqZ4dVdTtwa5KZQDoW+BKwBph5wmoVcHmbXgO8qj2ldQzw/XbZ6+PAcUn2aTfUj2ttkqQJGelMJMmFwJOAG4F7W3MBF3Tu93eAi5I8DPga8BoGgXZJkpOAW4CXtb5XAC8CNgA/an2pqi1J3gZc1/q9taq2dNYjSeow6pcNVwCHtXsVO62qbmzb3Naxs/Qt4OQ5tnMucO581CRJ2nGjXs66CfiZcRYiSVp4Rj0T2Q/4UpJrgR/PNFbVS8ZSlSRpQRg1RN4yziIkSQvTqI/4fjrJzwKHVNUn2rhZe4y3NEnSrm7UoeBfx+D7HH/bmpYBHx5TTZKkBWLUG+snA88E7oJ/fUHV48ZVlCRpYRg1RH5cVffMzCRZwjwOSyJJWphGDZFPJ3kzsGd7t/qHgP81vrIkSQvBqCFyCoORd78I/GcG3yL3jYaStMiN+nTWT4H/0X4kSQJGHzvr68xyD6SqnjjvFUmSFowdGTtrxiOAlwL7zn85kqSFZKR7IlX13aGfTVX1V8CLx1uaJGlXN+rlrCOGZh/C4Mxk1LMYSdJuatQg+Muh6a3AN7jvfR+SpEVq1KeznjvuQiRJC8+ol7P+y/aWV9U75qccSdJCsiNPZx3J4H3nAL8GXAusH0dRkqSFYdQQOQA4oqp+AJDkLcBHq+o3xlWYJGnXN+qwJ/sD9wzN39PaJEmL2KhnIhcA1yb5hzZ/AnD+WCqSJC0Yoz6ddXqSjwHPak2vqaobxleWJGkhGPVyFsBewF1V9U5gY5KDx1STJGmBGPX1uKcBbwJObU0PBf5uXEVJkhaGUc9E/j3wEuCHAFX1LeBR4ypKkrQwjBoi91RV0YaDT/LI8ZUkSVooRg2RS5L8LbB3ktcBn8AXVEnSovegT2clCfBB4FDgLuDJwB9X1dox1yZJ2sU9aIhUVSW5oqp+ATA4JEn/atTLWZ9LcuRYK5EkLTijfmP9aOA3knyDwRNaYXCScvi4CpMk7fq2GyJJDqqqbwIvmFA9kqQF5MEuZ30YoKpuAd5RVbcM/+zMjpPskeSGJB9p8wcnuSbJhiQfTPKw1v7wNr+hLV8+tI1TW/tXkxh0kjRhDxYiGZp+4jzv+/eALw/Nvx04s6p+DrgTOKm1nwTc2drPbP1IchjwcuApwPHAu5PsMc81SpK248FCpOaY3ilJDgBeDLyvzQd4HnBp63I+g5GCAVZy34jBlwLHtv4rgYur6sdV9XVgA3DUfNUoSXpwD3Zj/WlJ7mJwRrJnm4b7bqw/unO/fwX8IfcNnfJY4HtVtbXNbwSWtellwK0Mdrg1yfdb/2XA1UPbHF7nfpKsBlYDHHTQQZ0lS5K2td0zkarao6oeXVWPqqolbXpmvitAkvwqcEdVXd9VcYeqOruqVlTViqVLl05qt5K02xv1Ed/59EzgJUleBDwCeDTwTgZDqixpZyMHAJta/03AgQyGn18CPAb47lD7jOF1JEkTsCPvE5kXVXVqVR1QVcsZ3Bj/ZFW9EvgUcGLrtgq4vE2vafO05Z9sg0GuAV7ent46GDgEuHZCH0OSxHTORObyJuDiJH8C3ACc09rPAS5MsgHYwiB4qKqbk1wCfAnYCpxcVfdOvmxJWrymGiJVdRVwVZv+GrM8XVVV/wK8dI71TwdOH1+FkqTtmfjlLEnS7sMQkSR1M0QkSd0MEUlSN0NEktTNEJEkdTNEJEndDBFJUjdDRJLUzRCRJHUzRCRJ3QwRSVI3Q0SS1M0QkSR1M0QkSd0MEUlSN0NEktTNEJEkdTNEJEndDBFJUjdDRJLUzRCRJHUzRCRJ3QwRSVI3Q0SS1M0QkSR1M0QkSd0MEUlSN0NEktTNEJEkdTNEJEndDBFJUreJh0iSA5N8KsmXktyc5Pda+75J1iZZ337v09qT5KwkG5J8IckRQ9ta1fqvT7Jq0p9Fkha7aZyJbAV+v6oOA44BTk5yGHAKcGVVHQJc2eYBXggc0n5WA++BQegApwFHA0cBp80EjyRpMiYeIlV1W1V9rk3/APgysAxYCZzfup0PnNCmVwIX1MDVwN5JHg+8AFhbVVuq6k5gLXD85D6JJGmq90SSLAeeAVwD7F9Vt7VFtwP7t+llwK1Dq21sbXO1z7af1UnWJVm3efPm+fsAkrTITS1Ekvwb4O+BN1TVXcPLqqqAmq99VdXZVbWiqlYsXbp0vjYrSYveVEIkyUMZBMhFVXVZa/52u0xF+31Ha98EHDi0+gGtba52SdKETOPprADnAF+uqncMLVoDzDxhtQq4fKj9Ve0prWOA77fLXh8HjkuyT7uhflxrkyRNyJIp7POZwG8CX0xyY2t7M3AGcEmSk4BbgJe1ZVcALwI2AD8CXgNQVVuSvA24rvV7a1VtmcgnkCQBUwiRqvonIHMsPnaW/gWcPMe2zgXOnb/qJEk7wm+sS5K6GSKSpG6GiCSpmyEiSepmiEiSuhkikqRuhogkqZshIknqZohIkroZIpKkboaIJKmbISJJ6maISJK6GSKSpG6GiCSpmyEiSepmiEiSuhkikqRuhogkqZshIknqZohIkroZIpKkboaIJKmbISJJ6maISJK6GSKSpG6GiCSpmyEiSepmiEiSuhkikqRuhogkqZshIknqtuBDJMnxSb6aZEOSU6ZdjyQtJgs6RJLsAfwN8ELgMOAVSQ6bblWStHgs6BABjgI2VNXXquoe4GJg5ZRrkqRFI1U17Rq6JTkROL6qXtvmfxM4uqpev02/1cDqNvtk4Kudu9wP+E7nursbj8X9eTzuz+Nxn93hWPxsVS2dbcGSSVcyDVV1NnD2zm4nybqqWjEPJS14Hov783jcn8fjPrv7sVjol7M2AQcOzR/Q2iRJE7DQQ+Q64JAkByd5GPByYM2Ua5KkRWNBX86qqq1JXg98HNgDOLeqbh7jLnf6kthuxGNxfx6P+/N43Ge3PhYL+sa6JGm6FvrlLEnSFBkikqRuhsgckpyb5I4kNw217ZtkbZL17fc+06xxUuY4Fn+e5CtJvpDkH5LsPcUSJ2q24zG07PeTVJL9plHbpM11LJL8TvvzcXOSP5tWfZM2x9+Vpye5OsmNSdYlOWqaNc43Q2Ru5wHHb9N2CnBlVR0CXNnmF4PzeOCxWAs8taoOB/4fcOqki5qi83jg8SDJgcBxwDcnXdAUncc2xyLJcxmMHPG0qnoK8BdTqGtazuOBfzb+DPjvVfV04I/b/G7DEJlDVX0G2LJN80rg/DZ9PnDCJGualtmORVX9n6ra2mavZvAdnUVhjj8bAGcCfwgsmqdV5jgWvwWcUVU/bn3umHhhUzLH8Sjg0W36McC3JlrUmBkiO2b/qrqtTd8O7D/NYnYh/wn42LSLmKYkK4FNVfX5adeyC/h54FlJrkny6SRHTrugKXsD8OdJbmVwVrZbnbUbIp1q8Gz0ovk/zrkk+W/AVuCiadcyLUn2At7M4FKFBt8/2xc4BvivwCVJMt2Spuq3gDdW1YHAG4FzplzPvDJEdsy3kzweoP1eNKfps0nyauBXgVfW4v7C0ZOAg4HPJ/kGg0t7n0vyM1Otano2ApfVwLXATxkMQrhYrQIua9MfYjD6+G7DENkxaxj8gaD9vnyKtUxVkuMZXP9/SVX9aNr1TFNVfbGqHldVy6tqOYN/RI+oqtunXNq0fBh4LkCSnwcexsIfxXZnfAt4dpt+HrB+irXMO0NkDkk+APxf4MlJNiY5CTgDeH6S9cCvtPnd3hzH4l3Ao4C17dHF9061yAma43gsSnMci3OBJ7bHXC8GVi2WM9U5jsfrgL9M8nngT7nvtRS7BYc9kSR180xEktTNEJEkdTNEJEndDBFJUjdDRJLUzRCRtpHk3vbY8s1JPt9G5n1IW7YiyVnbWXd5kl/fzvInJLm0Tb86ybt2sLZXJ3nC0Pz7khy2I9uQ5tOCfj2uNCZ3txFXSfI44P0MBtA7rarWAeu2s+5y4NfbOveTZElVfQs4cSdqezVwE20Qv6p67U5sS9ppnolI29FGoF0NvD4Dz0nyEYAkz25nLDcmuSHJoxh8AfVZre2N7cxhTZJPAle2M5Xhd28cmOSq9o6a09p279cnyR8keUuSE4EVwEVt+3u2dVe0fq9I8sUkNyV5+9D6/5zk9HZWdXUSBw7VvDFEpAdRVV8D9gAet82iPwBObmctzwLuZvCOmX+sqqdX1Zmt3xHAiVX1bB7oKOA/AocDL50JhDnquJTBWdAr2/bvnlnWLnG9ncGwGk8HjkxyQlv8SODqqnoa8BkG36CW5oUhIvX7LPCOJL8L7D30fpVtra2q2d4/MrPsuy0QLgP+XWctRwJXVdXmVsdFwC+3ZfcAH2nT1zO45CbNC0NEehBJngjcyzajNlfVGcBrgT2BzyY5dI5N/HA7m9923KFiMLT+8N/NR+xQwQ/0k6Gxq+7Fe6GaR4aItB1JlgLvBd617SCCSZ7URvB9O3AdcCjwAwYDU47q+Un2TbIngzdlfhb4NvC4JI9N8nAGw+3PmGv71wLPTrJfkj2AVwCf3oE6pC7+H4n0QHsmuRF4KIOzgguBd8zS7w3tfeI/BW5m8HbHnwL3thFbzwPufJB9XQv8PYN3kPxde/qLJG9tyzYBXxnqfx7w3iR3A78001hVtyU5BfgUEOCjVbVoX1WgyXEUX0lSNy9nSZK6GSKSpG6GiCSpmyEiSepmiEiSuhkikqRuhogkqdv/BxYiD5QeTg+jAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"rv = sp.stats.randint(low=10, high=20)\n",
"rvs = rv.rvs(size=100000)\n",
"ax = sns.distplot(rvs, bins=50, kde=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution ', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 베르누이분포\n",
"\n",
"확률변수가 취할 수 있는 값이 0/1 밖에 없는 분포, 가장 기본적인 이산형 확률분포\n",
"\n",
"**확률질량함수**\n",
"\n",
"베르누이 분포에서 1이 나오는 확률을 p 라고 할때, \n",
"\n",
"$$f(x) = p^x(1-p)^{(1-x)} \\quad (x \\in {0, 1})$$\n",
"$$f(x) = 0 \\quad (otherwise)$$\n",
"\n",
"**기대값과 분산**\n",
"\n",
"$$E(X) = p\\quad V(X) = p(1-p)$$"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"f(0)=0.700, f(1)=0.300\n",
"F(0)=0.700, F(1)=1.000\n",
"E(X)=0.300, V(X)=0.210\n"
]
},
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution '), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"p = 0.3\n",
"rv = sp.stats.bernoulli(p)\n",
"print(f\"f(0)={rv.pmf(0):.3f}, f(1)={rv.pmf(1):.3f}\")\n",
"print(f\"F(0)={rv.cdf(0):.3f}, F(1)={rv.cdf(1):.3f}\")\n",
"print(f\"E(X)={rv.mean():.3f}, V(X)={rv.var():.3f}\")\n",
"\n",
"rvs = rv.rvs(size=100000)\n",
"ax = sns.distplot(rvs, bins=3, kde=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution ', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 이항분포 (Binomial Distribution)\n",
"\n",
"성공확률이 p인 베르누이 시행을 n번 했을 때의 성공 횟수가 따르는 분포\n",
"\n",
"**확률질량함수**\n",
"\n",
"연속된 n번의 독립적 시행에서 각 시행이 성공 확률 p를 가질 때, \n",
"\n",
"$$f(x) = nCxp^x(1-p)^{(n-x)} \\quad (x \\in {0, 1, 2, ... n})$$\n",
"$$f(x) = 0 \\quad (otherwise)$$\n",
"\n",
"**기대값과 분산**\n",
"\n",
"$$E(X) = np\\quad V(X) = np(1-p)$$\n",
"\n",
"\n",
"\n",
"https://en.wikipedia.org/wiki/Binomial_distribution\n",
"\n",
"**Further Reading**\n",
"* [이항 분포](https://ko.wikipedia.org/wiki/%EC%9D%B4%ED%95%AD_%EB%B6%84%ED%8F%AC)\n",
"* [Binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"f(1)=0.010, f(8)=0.044\n",
"F(1)=0.011, F(8)=0.989\n",
"E(X)=5.000, V(X)=2.500\n"
]
},
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEHCAYAAABfkmooAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUnUlEQVR4nO3df7DddX3n8edLggpoC2xCFpPYoJOVpbsV2AvSUleUiqCtYGtZaKuRQdOZhVW6utvIdBbGjjNxpkJla2kRs4KiLEWsWZsWI7W66wxCgJSfOmQRJCGQIK6gMiL0vX+c7y3HcG8+55J7zrnJfT5mzpzv9/P99f7OTe7rfn99vqkqJEnalReMuwBJ0txnWEiSmgwLSVKTYSFJajIsJElNhoUkqWnBsFacZBlwJbAYKOCyqvpYkguB9wA7ulnPr6r13TIfBM4GngHeW1XXd+0nAx8D9gEur6o1u9r2woULa/ny5bO+T5K0N7vlllserapFU00bWlgATwPvr6pbk7wUuCXJhm7axVX1J/0zJzkCOAP4ReBlwFeS/Ktu8seBNwJbgJuTrKuqu6fb8PLly9m4ceMs744k7d2SPDDdtKGFRVVtA7Z1w08kuQdYsotFTgWurqqfAN9Jshk4tpu2uaruA0hydTfvtGEhSZpdI7lmkWQ5cBTwza7p3CS3J1mb5KCubQnwYN9iW7q26dp33saqJBuTbNyxY8fOkyVJu2HoYZHkJcDngfOq6nHgUuCVwJH0jjw+OhvbqarLqmqiqiYWLZrylJsk6Xka5jULkuxLLyiuqqrrAKrqkb7pnwC+1I1uBZb1Lb60a2MX7ZKkERjakUWSAJ8E7qmqi/raD+2b7W3And3wOuCMJC9KchiwArgJuBlYkeSwJC+kdxF83bDqliQ91zCPLI4H3gHckWRT13Y+cGaSI+ndTns/8PsAVXVXkmvoXbh+Gjinqp4BSHIucD29W2fXVtVdQ6xbkrST7I1dlE9MTJS3zkrSzCS5paompprmE9ySpCbDQpLUNNS7oSQ915rbHp2yffVRC0dciTQ4jywkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUtOCcRcgjcOa2x6ddtrqoxaOsBJpz+CRhSSpybCQJDUZFpKkJsNCktTkBW5pnpjuor4X9DUIjywkSU2GhSSpaWhhkWRZkq8muTvJXUne17UfnGRDknu774O69iS5JMnmJLcnObpvXSu7+e9NsnJYNUuSpjbMI4ungfdX1RHAccA5SY4AVgM3VNUK4IZuHOAUYEX3WQVcCr1wAS4AXgMcC1wwGTCSpNEYWlhU1baqurUbfgK4B1gCnApc0c12BXBaN3wqcGX13AgcmORQ4E3Ahqp6rKq+D2wATh5W3ZKk5xrJNYsky4GjgG8Ci6tqWzfpYWBxN7wEeLBvsS1d23TtO29jVZKNSTbu2LFjdndAkua5oYdFkpcAnwfOq6rH+6dVVQE1G9upqsuqaqKqJhYtWjQbq5QkdYYaFkn2pRcUV1XVdV3zI93pJbrv7V37VmBZ3+JLu7bp2iVJIzLMu6ECfBK4p6ou6pu0Dpi8o2kl8MW+9nd2d0UdB/ygO111PXBSkoO6C9sndW2SpBEZ5hPcxwPvAO5IsqlrOx9YA1yT5GzgAeD0btp64M3AZuDHwFkAVfVYkj8Gbu7m+1BVPTbEuiVJOxlaWFTV/wEyzeQTp5i/gHOmWddaYO3sVSdJmgmf4JYkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkpoHCIsm/nemKk6xNsj3JnX1tFybZmmRT93lz37QPJtmc5NtJ3tTXfnLXtjnJ6pnWIUnafYMeWfx5kpuS/MckPz/gMp8CTp6i/eKqOrL7rAdIcgRwBvCL3TJ/nmSfJPsAHwdOAY4AzuzmlSSN0EBhUVWvBX4XWAbckuSzSd7YWObrwGMD1nEqcHVV/aSqvgNsBo7tPpur6r6qegq4uptXkjRCA1+zqKp7gT8C/hB4HXBJkm8l+c0ZbvPcJLd3p6kO6tqWAA/2zbOla5uu/TmSrEqyMcnGHTt2zLAkSdKuDHrN4peSXAzcA7wB+I2q+tfd8MUz2N6lwCuBI4FtwEdnVO0uVNVlVTVRVROLFi2ardVKkoAFA87334HLgfOr6snJxqp6KMkfDbqxqnpkcjjJJ4AvdaNb6Z3imrS0a2MX7ZKkERn0NNRbgM9OBkWSFyTZH6CqPj3oxpIc2jf6NmDyTql1wBlJXpTkMGAFcBNwM7AiyWFJXkjvIvi6QbcnSZodgx5ZfAX4NeCH3fj+wJeBX5lugSSfA04AFibZAlwAnJDkSKCA+4HfB6iqu5JcA9wNPA2cU1XPdOs5F7ge2AdYW1V3Db57kqTZMGhYvLiqJoOCqvrh5JHFdKrqzCmaP7mL+T8MfHiK9vXA+gHrlCQNwaCnoX6U5OjJkST/DnhyF/NLkvYigx5ZnAf8VZKHgAD/EvgPwypKkjS3DBQWVXVzksOBV3VN366qnw6vLEnSXDLokQXAMcDybpmjk1BVVw6lKknSnDJQWCT5NL2H6TYBz3TNBRgWkjQPDHpkMQEcUVU1zGIkSXPToHdD3UnvorYkaR4a9MhiIXB3kpuAn0w2VtVbh1KVJGlOGTQsLhxmEZKkuW3QW2e/luQXgBVV9ZXu6e19hluaJGmuGLSL8vcA1wJ/2TUtAf56SDVJkuaYQS9wnwMcDzwO//wipEOGVZQkaW4ZNCx+0r3WFIAkC+g9ZyFJmgcGDYuvJTkf2K979/ZfAf9reGVJkuaSQcNiNbADuIPeOyjW03sftyRpHhj0bqh/Aj7RfSRJ88ygfUN9hymuUVTVK2a9IknSnDOTvqEmvRj4beDg2S9HkjQXDXTNoqq+1/fZWlV/CrxluKVJkuaKQU9DHd03+gJ6RxozeReGJGkPNugv/I/2DT8N3A+cPuvVSJLmpEHvhnr9sAuRJM1dg56G+s+7ml5VF81OOZKkuWgmd0MdA6zrxn8DuAm4dxhFSZLmlkHDYilwdFU9AZDkQuBvqur3hlWYJGnuGDQsFgNP9Y0/1bVJu2XNbY9O2b76qIUjrkTD5M95zzdoWFwJ3JTkC934acAVQ6lIkjTnDHo31IeT/C3w2q7prKq6bXhlSZLmkkF7nQXYH3i8qj4GbEly2JBqkiTNMYO+VvUC4A+BD3ZN+wKfGVZRkqS5ZdAji7cBbwV+BFBVDwEvHVZRkqS5ZdCweKqqiq6b8iQHDK8kSdJcM2hYXJPkL4EDk7wH+Aq+CEmS5o3m3VBJAvxP4HDgceBVwH+rqg1Drk2SNEc0jyy600/rq2pDVf2XqvrAIEGRZG2S7Unu7Gs7OMmGJPd23wd17UlySZLNSW7v7xI9ycpu/nuTrHye+ylJ2g2Dnoa6NckxM1z3p4CTd2pbDdxQVSuAG7pxgFOAFd1nFXAp9MIFuAB4DXAscMFkwEiSRmfQsHgNcGOS/9v95X9Hktt3tUBVfR14bKfmU3n2ye8r6D0JPtl+ZfXcSO/ayKHAm4ANVfVYVX0f2MBzA0iSNGS7vGaR5OVV9V16v7Rnw+Kq2tYNP8yz/UstAR7sm29L1zZd+1S1rqJ3VMLLX/7yWSpXkgTtI4u/BqiqB4CLquqB/s/ubLj/VtzZUFWXVdVEVU0sWrRotlYrSaIdFukbfsUsbO+R7vQS3ff2rn0rsKxvvqVd23TtkqQRaoVFTTP8fK0DJu9oWgl8sa/9nd1dUccBP+hOV10PnJTkoO7C9kldmyRphFrPWbw6yeP0jjD264bpxquqfm66BZN8DjgBWJhkC727mtbQe8DvbOAB4PRu9vXAm4HNwI+Bs+ht4LEkfwzc3M33oara+aK5JGnIdhkWVbXP811xVZ05zaQTp5i3gHOmWc9aYO3zrUOStPtm0kW5JGmeMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNYwmLJPcnuSPJpiQbu7aDk2xIcm/3fVDXniSXJNmc5PYkR4+jZkmaz8Z5ZPH6qjqyqia68dXADVW1ArihGwc4BVjRfVYBl468Ukma5+bSaahTgSu64SuA0/rar6yeG4EDkxw6hvokad4aV1gU8OUktyRZ1bUtrqpt3fDDwOJueAnwYN+yW7q2n5FkVZKNSTbu2LFjWHVL0ry0YEzb/dWq2prkEGBDkm/1T6yqSlIzWWFVXQZcBjAxMTGjZSVJuzaWI4uq2tp9bwe+ABwLPDJ5eqn73t7NvhVY1rf40q5NkjQiIw+LJAckeenkMHAScCewDljZzbYS+GI3vA54Z3dX1HHAD/pOV0mSRmAcp6EWA19IMrn9z1bV3yW5GbgmydnAA8Dp3fzrgTcDm4EfA2eNvmRJmt9GHhZVdR/w6inavwecOEV7AeeMoLR5bc1tj07ZvvqohSOuRNJcNK4L3JI0dP4RNHvm0nMWkqQ5yrCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkpgXjLkDPWnPbo1O2rz5q4YgrkbS79rb/zx5ZSJKaDAtJUpNhIUlqMiwkSU2GhSSpaY8JiyQnJ/l2ks1JVo+7HkmaT/aIsEiyD/Bx4BTgCODMJEeMtypJmj/2lOcsjgU2V9V9AEmuBk4F7h5rVZI0JqN+jiNVNZQVz6YkbwdOrqp3d+PvAF5TVef2zbMKWNWNvgr49m5sciEw9U9i7zXf9nm+7S+4z/PF7uzzL1TVoqkm7ClHFk1VdRlw2WysK8nGqpqYjXXtKebbPs+3/QX3eb4Y1j7vEdcsgK3Asr7xpV2bJGkE9pSwuBlYkeSwJC8EzgDWjbkmSZo39ojTUFX1dJJzgeuBfYC1VXXXEDc5K6ez9jDzbZ/n2/6C+zxfDGWf94gL3JKk8dpTTkNJksbIsJAkNRkWfeZblyJJliX5apK7k9yV5H3jrmlUkuyT5LYkXxp3LaOQ5MAk1yb5VpJ7kvzyuGsatiR/0P27vjPJ55K8eNw1zbYka5NsT3JnX9vBSTYkubf7Pmg2tmVYdOZplyJPA++vqiOA44Bz5sE+T3ofcM+4ixihjwF/V1WHA69mL9/3JEuA9wITVfVv6N0Yc8Z4qxqKTwEn79S2GrihqlYAN3Tju82weNY/dylSVU8Bk12K7LWqaltV3doNP0HvF8iS8VY1fEmWAm8BLh93LaOQ5OeBfw98EqCqnqqq/zfWokZjAbBfkgXA/sBDY65n1lXV14HHdmo+FbiiG74COG02tmVYPGsJ8GDf+BbmwS/OSUmWA0cB3xxzKaPwp8B/Bf5pzHWMymHADuB/dKfeLk9ywLiLGqaq2gr8CfBdYBvwg6r68nirGpnFVbWtG34YWDwbKzUsRJKXAJ8Hzquqx8ddzzAl+XVge1XdMu5aRmgBcDRwaVUdBfyIWTo1MVd15+lPpReULwMOSPJ7461q9Kr3bMSsPB9hWDxrXnYpkmRfekFxVVVdN+56RuB44K1J7qd3qvENST4z3pKGbguwpaomjxqvpRcee7NfA75TVTuq6qfAdcCvjLmmUXkkyaEA3ff22VipYfGsedelSJLQO499T1VdNO56RqGqPlhVS6tqOb2f8d9X1V79F2dVPQw8mORVXdOJ7P3d+38XOC7J/t2/8xPZyy/q91kHrOyGVwJfnI2V7hHdfYzCGLoUmQuOB94B3JFkU9d2flWtH19JGpL/BFzV/SF0H3DWmOsZqqr6ZpJrgVvp3fV3G3th1x9JPgecACxMsgW4AFgDXJPkbOAB4PRZ2ZbdfUiSWjwNJUlqMiwkSU2GhSSpybCQJDUZFpKkJsNC816SZ5Js6noo/cck70/ygm7aRJJLdrHs8iS/s4vpL+tu4STJu5L82Qxre1eSl/WNXz6POnvUHOJzFhI8WVVHAiQ5BPgs8HPABVW1Edi4i2WXA7/TLfMzkiyoqoeAt+9Gbe8C7qTrBK+q3r0b65KeN48spD5VtR1YBZybnhMm33mR5HXdEcimrkO+l9J7AOq1XdsfdEcC65L8PXBDd+RxZ98mliX5h+5dAxd06/2ZeZJ8IMmFSd4OTNB7mG5Tkv26ZSe6+c5Mckf3voaP9C3/wyQf7o6SbkwyKx3JaX4zLKSdVNV99J7iP2SnSR8AzumOQl4LPEmvQ77/XVVHVtXF3XxHA2+vqtdNsfpjgd8Cfgn47clf/NPUcS29o5rf7db/5OS07tTUR4A3AEcCxyQ5rZt8AHBjVb0a+DrwngF3XZqWYSEN7hvARUneCxxYVU9PM9+Gqtr5HQP9077X/eK/DvjV51nLMcA/dB3lPQ1cRe+dFQBPAZNvALyF3qkyabcYFtJOkrwCeIadeuusqjXAu4H9gG8kOXyaVfxoF6vfuX+dotd3Uf//xd19/edP69l+fJ7Ba5OaBYaF1CfJIuAvgD+rnTpOS/LKqrqjqj5Cr5fiw4EngJfOYBNv7N6RvB+9N5h9A3gEOCTJv0jyIuDX++afbv03Aa9LsrB7JfCZwNdmUIc0I/7FIfVevbkJ2JfeX/mfBqbqsv28JK+n94a9u4C/7YafSfKP9N6H/P3Gtm6i9/6QpcBnurutSPKhbtpW4Ft9838K+IskTwK/PNlYVduSrAa+CgT4m6qala6opanY66wkqcnTUJKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqen/AwWwvsRRV4QCAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 성공확률이 0.5인 동전던지기에서 10번 수행시 성공확률\n",
"n = 10\n",
"p = 0.5\n",
"rv = sp.stats.binom(n=n,p=p)\n",
"print(f\"f(1)={rv.pmf(1):.3f}, f(8)={rv.pmf(8):.3f}\")\n",
"print(f\"F(1)={rv.cdf(1):.3f}, F(8)={rv.cdf(8):.3f}\")\n",
"print(f\"E(X)={rv.mean():.3f}, V(X)={rv.var():.3f}\")\n",
"\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, kde=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 기하분포 (Geometric Distribution)\n",
"\n",
"베르누이 시행에서 처음 성공할때까지 반복한 시행 횟수가 따르는 분포\n",
"\n",
"**확률질량함수**\n",
"\n",
"성공 확률이 p이고, 확률변수의 값이 1 이상인 정수일때(기하분포는 첫성공 후, 연속 실패 가능성이 있기 때문),\n",
"\n",
"$$f(x) = (1-p)^{(x-1)}p \\quad (x \\in {1, 2, ... n})$$\n",
"$$f(x) = 0 \\quad (otherwise)$$\n",
"\n",
"**기대값과 분산**\n",
"\n",
"$$E(X) = \\frac{1}{p} \\quad V(X) = \\frac{(1-p)}{p^2}$$\n",
"\n",
"\n",
"\n",
"https://en.wikipedia.org/wiki/Geometric_distribution\n",
"\n",
"**Further Reading**\n",
"* [Geometric distribution](https://en.wikipedia.org/wiki/Geometric_distribution)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"f(1)=0.500, f(8)=0.004\n",
"F(1)=0.500, F(8)=0.996\n",
"E(X)=2.000, V(X)=2.000\n"
]
},
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVa0lEQVR4nO3de7SddX3n8fcHggoUBUxgMGEabLNgaCuXCReHWhXKxUuBmUEHazU4aGatwal27KqhM6s4WmbBmhlR6lTLCGPAC1KqJWNpNXLRGdcgJIBcZZKiSMIlgSBQpWLwO3/sX+wmnJNnJ+fss8/Jeb/WOus8z/e57O+zVpJPnsv+PakqJEnall1G3YAkafozLCRJnQwLSVInw0KS1MmwkCR1mjPqBoZh7ty5tXDhwlG3IUkzyurVqx+rqnljLdspw2LhwoWsWrVq1G1I0oyS5IHxlnkZSpLUybCQJHUyLCRJnQwLSVKnoYZFku8nuTPJ7UlWtdq+SVYmWdN+79PqSXJxkrVJ7khyZN9+lrT11yRZMsyeJUkvNBVnFq+vqsOranGbXwZcV1WLgOvaPMAbgEXtZynwSeiFC3AecAxwNHDeloCRJE2NUVyGOg1Y3qaXA6f31S+vnpuAvZMcAJwMrKyqTVX1BLASOGWKe5akWW3YYVHA15KsTrK01favqofb9CPA/m16PvBg37brWm28+vMkWZpkVZJVGzdunMxjkKRZb9hfyvv1qlqfZD9gZZLv9i+sqkoyKS/UqKpLgEsAFi9e7Es6JGkSDTUsqmp9+70hyZfp3XN4NMkBVfVwu8y0oa2+Hjiwb/MFrbYeeN1W9RuH2fcFtz02Zn3ZEXOH+bGSNG0N7TJUkj2T7LVlGjgJuAtYAWx5omkJcE2bXgG8sz0VdSzwZLtc9VXgpCT7tBvbJ7WaJGmKDPPMYn/gy0m2fM7nq+pvktwCXJXkbOAB4K1t/WuBNwJrgR8D7wKoqk1JPgLc0tb7cFVtGmLfkqStDC0squp+4LAx6o8DJ4xRL+CccfZ1GXDZZPcoSRqM3+CWJHUyLCRJnQwLSVInw0KS1MmwkCR1MiwkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVInw0KS1MmwkCR1MiwkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVInw0KS1MmwkCR1MiwkSZ0MC0lSJ8NCktRp6GGRZNcktyX5Sps/KMm3k6xN8sUkL2r1F7f5tW35wr59nNvq9yU5edg9S5KebyrOLN4H3Ns3fyFwUVX9MvAEcHarnw080eoXtfVIcihwJvArwCnAnybZdQr6liQ1Qw2LJAuANwGfbvMBjgeubqssB05v06e1edryE9r6pwFXVtVPqup7wFrg6GH2LUl6vmGfWXwM+APgZ23+5cAPq2pzm18HzG/T84EHAdryJ9v6P6+PsY0kaQoMLSySvBnYUFWrh/UZW33e0iSrkqzauHHjVHykJM0awzyzOA44Ncn3gSvpXX76OLB3kjltnQXA+ja9HjgQoC1/GfB4f32MbX6uqi6pqsVVtXjevHmTfzSSNIsNLSyq6tyqWlBVC+ndoL6+qt4O3ACc0VZbAlzTple0edry66uqWv3M9rTUQcAi4OZh9S1JeqE53atMug8CVyb5Y+A24NJWvxS4IslaYBO9gKGq7k5yFXAPsBk4p6qem/q2JWn2mpKwqKobgRvb9P2M8TRTVf098JZxtj8fOH94HUqStsVvcEuSOhkWkqROhoUkqZNhIUnqZFhIkjoZFpKkToaFJKmTYSFJ6mRYSJI6GRaSpE6GhSSpk2EhSepkWEiSOhkWkqROhoUkqZNhIUnqZFhIkjoZFpKkToaFJKmTYSFJ6mRYSJI6GRaSpE6GhSSpk2EhSepkWEiSOhkWkqROhoUkqZNhIUnqZFhIkjoZFpKkTgOFRZJfG3YjkqTpa9Aziz9NcnOSf5vkZUPtSJI07QwUFlX1GuDtwIHA6iSfT3LiUDuTJE0bA9+zqKo1wH8EPgi8Frg4yXeT/Iux1k/yknY28p0kdyf5T61+UJJvJ1mb5ItJXtTqL27za9vyhX37OrfV70ty8gSOV5K0Awa9Z/GqJBcB9wLHA79VVf+kTV80zmY/AY6vqsOAw4FTkhwLXAhcVFW/DDwBnN3WPxt4otUvauuR5FDgTOBXgFPoXRLbdXsPVJK04wY9s/gT4FbgsKo6p6puBaiqh+idbbxA9fxdm92t/RS9gLm61ZcDp7fp09o8bfkJSdLqV1bVT6rqe8Ba4OgB+5YkTYJBw+JNwOer6hmAJLsk2QOgqq4Yb6Mkuya5HdgArAT+FvhhVW1uq6wD5rfp+cCDbZ+bgSeBl/fXx9im/7OWJlmVZNXGjRsHPCxJ0iAGDYuvA7v3ze/RattUVc9V1eHAAnpnA4dsb4ODqqpLqmpxVS2eN2/esD5GkmalQcPiJX2XlGjTewz6IVX1Q+AG4NXA3knmtEULgPVtej29p61oy18GPN5fH2MbSdIUGDQsfpTkyC0zSf4p8My2NkgyL8nebXp34ER6N8hvAM5oqy0BrmnTK9o8bfn1VVWtfmZ7WuogYBFw84B9S5ImwZzuVQB4P/DnSR4CAvwj4F91bHMAsLw9ubQLcFVVfSXJPcCVSf4YuA24tK1/KXBFkrXAJnpPQFFVdye5CrgH2AycU1XPDXqAkqSJGygsquqWJIcAB7fSfVX1045t7gCOGKN+P2M8zVRVfw+8ZZx9nQ+cP0ivkqTJN+iZBcBRwMK2zZFJqKrLh9KVJGlaGSgsklwB/BJwO7DlElABhoUkzQKDnlksBg5tN5wlSbPMoE9D3UXvprYkaRYa9MxiLnBPkpvpjfkEQFWdOpSuJEnTyqBh8aFhNiFJmt4GfXT2G0l+EVhUVV9v40I58qskzRKDDlH+Hnojwf5ZK80H/nJIPUmSpplBb3CfAxwHPAU/fxHSfsNqSpI0vQwaFj+pqme3zLSB/nyMVpJmiUHD4htJ/hDYvb17+8+B/zW8tiRJ08mgYbEM2AjcCfwb4FrGeUOeJGnnM+jTUD8D/kf7kSTNMoOODfU9xrhHUVWvnPSOJEnTzvaMDbXFS+gNJb7v5LcjSZqOBr0M9fhWpY8lWQ380eS3tPO74LbHxqwvO2LuFHciSYMZ9DLUkX2zu9A709ied2FIkmawQf/B/29905uB7wNvnfRuJEnT0qCXoV4/7EYkSdPXoJeh/v22llfVRyenHUnSdLQ9T0MdBaxo878F3AysGUZTkqTpZdCwWAAcWVVPAyT5EPBXVfU7w2pMkjR9DDrcx/7As33zz7aaJGkWGPTM4nLg5iRfbvOnA8uH0pEkadoZ9Gmo85P8NfCaVnpXVd02vLYkSdPJoJehAPYAnqqqjwPrkhw0pJ4kSdPMoK9VPQ/4IHBuK+0GfHZYTUmSppdBzyz+OXAq8COAqnoI2GtYTUmSppdBw+LZqiraMOVJ9hxeS5Kk6WbQsLgqyZ8Beyd5D/B1fBGSJM0anU9DJQnwReAQ4CngYOCPqmrlkHuTJE0TnWFRVZXk2qr6NcCAkKRZaNDLULcmOWqonUiSpq1Bw+IY4KYkf5vkjiR3JrljWxskOTDJDUnuSXJ3kve1+r5JViZZ037v0+pJcnGSte0zjuzb15K2/pokS3b0YCVJO2abl6GS/OOq+gFw8g7sezPwgaq6NclewOokK4GzgOuq6oIky4Bl9L7D8QZgUfs5BvgkcEySfYHz6I18W20/K6rqiR3oSZK0A7rOLP4SoKoeAD5aVQ/0/2xrw6p6uKpubdNPA/cC84HT+IdxpZbTG2eKVr+8em6i9+TVAfSCamVVbWoBsRI4ZTuPU5I0AV1hkb7pV+7ohyRZCBwBfBvYv6oebose4R9Gr50PPNi32bpWG6++9WcsTbIqyaqNGzfuaKuSpDF0hUWNMz2wJL8A/AXw/qp66nk77/ui30RV1SVVtbiqFs+bN28ydilJarrC4rAkTyV5GnhVm34qydNJnurYliS70QuKz1XVl1r50XZ5ifZ7Q6uvBw7s23xBq41XlyRNkW2GRVXtWlUvraq9qmpOm94y/9Jtbdu+zHcpcO9W7+heAWx5omkJcE1f/Z3tqahjgSfb5aqvAicl2ac9OXVSq0mSpsigLz/aEccB7wDuTHJ7q/0hcAG94UPOBh4A3tqWXQu8EVgL/Bh4F0BVbUryEeCWtt6Hq2rTEPuWJG1laGFRVf+H598g73fCGOsXcM44+7oMuGzyupMkbY/tefmRJGmWMiwkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVInw0KS1MmwkCR1MiwkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVInw0KS1GnOqBvQ5LvgtsfGrC87Yu4UdyJpZ+GZhSSpk2EhSepkWEiSOhkWkqROhoUkqZNhIUnqZFhIkjoZFpKkTkMLiySXJdmQ5K6+2r5JViZZ037v0+pJcnGStUnuSHJk3zZL2vprkiwZVr+SpPEN88ziM8ApW9WWAddV1SLgujYP8AZgUftZCnwSeuECnAccAxwNnLclYCRJU2doYVFV3wQ2bVU+DVjeppcDp/fVL6+em4C9kxwAnAysrKpNVfUEsJIXBpAkacim+p7F/lX1cJt+BNi/Tc8HHuxbb12rjVd/gSRLk6xKsmrjxo2T27UkzXIju8FdVQXUJO7vkqpaXFWL582bN1m7lSQx9WHxaLu8RPu9odXXAwf2rbeg1carS5Km0FSHxQpgyxNNS4Br+urvbE9FHQs82S5XfRU4Kck+7cb2Sa0mSZpCQ3ufRZIvAK8D5iZZR++ppguAq5KcDTwAvLWtfi3wRmAt8GPgXQBVtSnJR4Bb2nofrqqtb5pLkoZsaGFRVW8bZ9EJY6xbwDnj7Ocy4LJJbE2StJ38BrckqZNhIUnqZFhIkjoZFpKkToaFJKmTYSFJ6mRYSJI6GRaSpE6GhSSp09C+wa2d2wW3PTZmfdkRc6e4E0lTwTMLSVInw0KS1MmwkCR1MiwkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVInR53VtOSottL04pmFJKmTYSFJ6mRYSJI6GRaSpE6GhSSpk09DaVbyaStp+3hmIUnqZFhIkjp5GUqaYl4C00w0Y84skpyS5L4ka5MsG3U/kjSbzIgziyS7Av8dOBFYB9ySZEVV3TPazqTpxzMXDcOMCAvgaGBtVd0PkORK4DTAsJCm2HhhBJMXSAbe9JOqGnUPnZKcAZxSVe9u8+8Ajqmq9/atsxRY2mYPBu6b8ka3z1xg/L91M8vOciw7y3GAxzJdTfdj+cWqmjfWgplyZtGpqi4BLhl1H4NKsqqqFo+6j8mwsxzLznIc4LFMVzP5WGbKDe71wIF98wtaTZI0BWZKWNwCLEpyUJIXAWcCK0bckyTNGjPiMlRVbU7yXuCrwK7AZVV194jbmqgZc8lsADvLsewsxwEey3Q1Y49lRtzgliSN1ky5DCVJGiHDQpLUybCYQkkOTHJDknuS3J3kfaPuaaKS7JrktiRfGXUvE5Fk7yRXJ/luknuTvHrUPe2oJL/X/nzdleQLSV4y6p4GleSyJBuS3NVX2zfJyiRr2u99RtnjIMY5jv/S/nzdkeTLSfYeYYvbzbCYWpuBD1TVocCxwDlJDh1xTxP1PuDeUTcxCT4O/E1VHQIcxgw9piTzgd8FFlfVr9J7IOTM0Xa1XT4DnLJVbRlwXVUtAq5r89PdZ3jhcawEfrWqXgX8P+DcqW5qIgyLKVRVD1fVrW36aXr/IM0fbVc7LskC4E3Ap0fdy0QkeRnwG8ClAFX1bFX9cKRNTcwcYPckc4A9gIdG3M/AquqbwKatyqcBy9v0cuD0qexpR4x1HFX1tara3GZvovd9sRnDsBiRJAuBI4Bvj7iVifgY8AfAz0bcx0QdBGwE/me7pPbpJHuOuqkdUVXrgf8K/AB4GHiyqr422q4mbP+qerhNPwLsP8pmJsm/Bv561E1sD8NiBJL8AvAXwPur6qlR97MjkrwZ2FBVq0fdyySYAxwJfLKqjgB+xMy41PEC7Xr+afQC8BXAnkl+Z7RdTZ7qPes/o5/3T/If6F2S/tyoe9kehsUUS7IbvaD4XFV9adT9TMBxwKlJvg9cCRyf5LOjbWmHrQPWVdWWs7yr6YXHTPSbwPeqamNV/RT4EvDPRtzTRD2a5ACA9nvDiPvZYUnOAt4MvL1m2JfcDIsplCT0rovfW1UfHXU/E1FV51bVgqpaSO8G6vVVNSP/B1tVjwAPJjm4lU5g5g5//wPg2CR7tD9vJzBDb9b3WQEsadNLgGtG2MsOS3IKvcu2p1bVj0fdz/YyLKbWccA76P0v/Pb288ZRNyUA/h3wuSR3AIcD/3m07eyYdnZ0NXArcCe9v+MzZoiJJF8A/i9wcJJ1Sc4GLgBOTLKG3pnTBaPscRDjHMcngL2Ale3v/qdG2uR2crgPSVInzywkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAvNakmea48x3p3kO0k+kGSXtmxxkou3se3CJL+9jeWvSHJ1mz4rySe2s7ezkryib/7TO8HAk5qhZsRrVaUheqaqDgdIsh/weeClwHlVtQpYtY1tFwK/3bZ5niRzquoh4IwJ9HYWcBdtIMCqevcE9iVNiGcWUlNVG4ClwHvT87ot7+lI8tq+L1LelmQvel8Oe02r/V47E1iR5HrgunbmcVffRxyY5Mb2Xobz2n6ft06S30/yoSRnAIvpfVHw9iS7t20Xt/XeluTO9s6KC/u2/7sk57ezpJuS7AyD7mkaMCykPlV1P713QOy31aLfB85pZyGvAZ6hN9jg/66qw6vqorbekcAZVfXaMXZ/NPAvgVcBb9nyD/84fVxN76zm7W3/z2xZ1i5NXQgcT+/b5kclOb0t3hO4qaoOA74JvGfAQ5e2ybCQBvMt4KNJfhfYu++9BFtbWVVbv4+hf9nj7R/+LwG/voO9HAXc2AYL3DJ66W+0Zc8CW95auJrepTJpwgwLqU+SVwLPsdXIplV1AfBuYHfgW0kOGWcXP9rG7rceW6foDVXd//dwoq9A/WnfaKbP4X1JTRLDQmqSzAM+BXxi6+Gjk/xSVd1ZVRcCtwCHAE/TGxhuUCe290nvTu9tb98CHgX2S/LyJC+mN3z1FuPt/2bgtUnmJtkVeBvwje3oQ9pu/q9Ds93uSW4HdqP3v/wrgLGGj39/ktfTeyvg3fTecvYz4Lkk36H3zuUnOj7rZnrvMlkAfLY9bUWSD7dl64Hv9q3/GeBTSZ4BXr2lWFUPJ1kG3AAE+KuqmpHDdmvmcNRZSVInL0NJkjoZFpKkToaFJKmTYSFJ6mRYSJI6GRaSpE6GhSSp0/8HvfMRJoXOlJgAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 성공확률이 0.5인 동전던지기에서 x번 시행시 성공할 확률\n",
"p = 0.5\n",
"rv = sp.stats.geom(p=p)\n",
"print(f\"f(1)={rv.pmf(1):.3f}, f(8)={rv.pmf(8):.3f}\")\n",
"print(f\"F(1)={rv.cdf(1):.3f}, F(8)={rv.cdf(8):.3f}\")\n",
"print(f\"E(X)={rv.mean():.3f}, V(X)={rv.var():.3f}\")\n",
"\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, kde=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 포아송 분포 (Poisson Distribution)\n",
"\n",
"시행 횟수가 많고, 사상 발생의 확률은 아주 작은 이항분포, 단위 시간 또는 단위 공간 안에 어떤 사건이 몇 번 발생할 것인지를 표현하는 이산 확률 분포\n",
"\n",
"**확률질량함수**\n",
"\n",
"정해진 시간 안에 어떤 사건이 일어날 횟수에 대한 기댓값이 $\\lambda = np$ 일때, 그 사건이 x 회 일어날 확률 \n",
"\n",
"$$f(x) = \\frac{\\lambda^x e^{-\\lambda}}{x!} \\quad (x \\in {0, 1, 2, ... n})$$\n",
"$$f(x) = 0 \\quad (otherwise)$$\n",
"\n",
"**기대값과 분산**\n",
"\n",
"$$E(X) = \\lambda \\quad V(X) = \\lambda$$\n",
"\n",
"$\\lambda$ 값이 커질 수록 정규분포에 가까워짐\n",
"\n",
"\n",
"\n",
"https://en.wikipedia.org/wiki/Poisson_distribution\n",
"\n",
"**Further Reading**\n",
"* [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"f(1)=0.271, f(8)=0.001\n",
"F(1)=0.406, F(8)=1.000\n",
"E(X)=2.000, V(X)=2.000\n"
]
},
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUTklEQVR4nO3df7DddX3n8edLQAWkBTYhCwnboJPK0lZJ9oJ0qesPVsQfFdxaF1oVGTWdWVila2cbmc7i2GEmzlioVJcWMSsoylLEmm3T0kCtbjuDECAFAjpkECQhQhBXqDJi6Hv/ON9rTkLu/ZwL99xzkvt8zNy53/P5fr7f875nkvu6n++PzzdVhSRJ03nBqAuQJI0/w0KS1GRYSJKaDAtJUpNhIUlq2n/UBQzDggULaunSpaMuQ5L2KrfddttjVbVwT+v2ybBYunQpGzZsGHUZkrRXSfLgVOs8DCVJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWraJ+/g1sytvuOxgfqtWr5gyJVIGkeOLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpP3WYwR73WQNK4cWUiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUNLSwSHJ0kq8luSfJpiQf6to/mmRrko3d15v7tvlIks1Jvp3kjX3tp3Vtm5OsGlbNkqQ9G+Yd3DuAD1fV7UkOAW5Lsr5bd0lVfaK/c5LjgDOBXwKOAm5M8ovd6k8DbwC2ALcmWVtV9wyxdklSn6GFRVVtA7Z1y08muRdYPM0mpwPXVNVPgO8k2Qyc2K3bXFX3AyS5putrWEjSHJmTcxZJlgLLgW92TecluTPJmiSHdW2LgYf6NtvStU3Vvvt7rEyyIcmG7du3z/aPIEnz2tDDIslLgC8D51fVE8BlwMuA4+mNPP5oNt6nqi6vqomqmli4cOFs7FKS1BnqrLNJDqAXFFdX1fUAVfVI3/rPAH/ZvdwKHN23+ZKujWnaJUlzYJhXQwX4LHBvVV3c135kX7e3A3d3y2uBM5O8KMkxwDLgFuBWYFmSY5K8kN5J8LXDqluS9GzDHFmcDLwbuCvJxq7tAuCsJMcDBTwA/A5AVW1Kci29E9c7gHOr6hmAJOcBNwD7AWuqatMQ65Yk7WaYV0P9A5A9rFo3zTYXARftoX3ddNtJkobLO7glSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkpqGFRZKjk3wtyT1JNiX5UNd+eJL1Se7rvh/WtSfJpUk2J7kzyYq+fZ3d9b8vydnDqlmStGfDHFnsAD5cVccBJwHnJjkOWAXcVFXLgJu61wBvApZ1XyuBy6AXLsCFwKuAE4ELJwNGkjQ3hhYWVbWtqm7vlp8E7gUWA6cDV3bdrgTO6JZPB66qnpuBQ5McCbwRWF9Vj1fVD4D1wGnDqluS9Gxzcs4iyVJgOfBNYFFVbetWfQ9Y1C0vBh7q22xL1zZV++7vsTLJhiQbtm/fPrs/gCTNc0MPiyQvAb4MnF9VT/Svq6oCajbep6our6qJqppYuHDhbOxSktQZalgkOYBeUFxdVdd3zY90h5fovj/atW8Fju7bfEnXNlW7JGmODPNqqACfBe6tqov7Vq0FJq9oOhv4al/7e7qrok4CftgdrroBODXJYd2J7VO7NknSHNl/iPs+GXg3cFeSjV3bBcBq4Nok7wMeBN7ZrVsHvBnYDPwYOAegqh5P8ofArV2/j1XV40OsW5K0m6GFRVX9A5ApVp+yh/4FnDvFvtYAa2avOknSTHgHtySpybCQJDUZFpKkJsNCktQ0zKuhpFmz+o7HBuq3avmCIVcizU+OLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUtNAYZHkV4ZdiCRpfA06svifSW5J8l+S/PxQK5IkjZ2BwqKqXg38Nr0n1t2W5ItJ3jDUyiRJY2PgcxZVdR/wB8DvA68BLk3yrST/aVjFSZLGw6DnLF6R5BLgXuD1wK9X1b/tli8ZYn2SpDEw6ESCfwJcAVxQVU9NNlbVw0n+YCiVSZLGxqBh8Rbgqap6BiDJC4AXV9WPq+rzQ6tOkjQWBj1ncSNwYN/rg7o2SdI8MGhYvLiq/nnyRbd80HBKkiSNm0HD4kdJVky+SPLvgKem6S9J2ocMes7ifODPkzwMBPjXwH8eVlGSpPEyUFhU1a1JjgVe3jV9u6p+OryyJEnjZCbP4D4BWNptsyIJVXXVUKqSJI2VgcIiyeeBlwEbgWe65gIMC0maBwYdWUwAx1VVDbMYSdJ4GvRqqLvpndSWJM1Dg44sFgD3JLkF+MlkY1W9bShVSZLGyqBh8dGZ7jjJGuCtwKNV9ctd20eBDwDbu24XVNW6bt1HgPfROyfywaq6oWs/DfgksB9wRVWtnmktkqTnZ9BLZ7+e5BeAZVV1Y5KD6P3yns7ngE/x7JPgl1TVJ/obkhwHnAn8EnAUcGOSX+xWfxp4A7AFuDXJ2qq6Z5C6JUmzY9Apyj8AXAf8Wde0GPiL6bapqm8Ajw9Yx+nANVX1k6r6DrAZOLH72lxV91fV08A1XV9J0hwa9AT3ucDJwBPwswchHfEc3/O8JHcmWZPksK5tMfBQX58tXdtU7c+SZGWSDUk2bN++fU9dJEnP0aBh8ZPuL3sAkuxP7z6LmbqM3v0axwPbgD96DvvYo6q6vKomqmpi4cKFs7VbSRKDh8XXk1wAHNg9e/vPgf8z0zerqkeq6pmq+hfgM/QOMwFspfd870lLurap2iVJc2jQsFhF7wqmu4DfAdbRex73jCQ5su/l2+ndvwGwFjgzyYuSHAMsA24BbgWWJTkmyQvpnQRfO9P3lSQ9P4NeDTU5EvjMoDtO8iXgtcCCJFuAC4HXJjme3iGsB+gFD1W1Kcm1wD3ADuDcvqfynQfcQO/qqzVVtWnQGiRJs2PQuaG+wx7OUVTVS6fapqrO2kPzZ6fpfxFw0R7a19EbyUiSRmQmc0NNejHwm8Dhs1+OJGkcDXTOoqq+3/e1tar+GHjLcEuTJI2LQQ9Dreh7+QJ6I42ZPAtDkrQXG/QXfv/9EDvonZx+56xXI0kaS4NeDfW6YRciSRpfgx6G+m/Tra+qi2enHEnSOJrJ1VAnsPOGuF+nd9PcfcMoatRW3/HYQP1WLV8w5EokaTwMGhZLgBVV9ST87LkUf1VV7xpWYZKk8THodB+LgKf7Xj/dtUmS5oFBRxZXAbck+Ur3+gzgyqFUJEkaO4NeDXVRkr8GXt01nVNVdwyvLEnSOBn0MBTAQcATVfVJYEs3O6wkaR4Y9LGqFwK/D3ykazoA+MKwipIkjZdBRxZvB94G/Aigqh4GDhlWUZKk8TJoWDxdVUU3TXmSg4dXkiRp3AwaFtcm+TPg0CQfAG5kBg9CkiTt3ZpXQyUJ8L+BY4EngJcD/6Oq1g+5NknSmGiGRVVVknVV9SuAASFJ89Cgh6FuT3LCUCuRJI2tQe/gfhXwriQP0LsiKvQGHa8YVmGSpPExbVgk+TdV9V3gjXNUjyRpDLVGFn9Bb7bZB5N8uap+Yw5qkiSNmdY5i/Qtv3SYhUiSxlcrLGqKZUnSPNI6DPXKJE/QG2Ec2C3DzhPcPzfU6iRJY2HasKiq/eaqEEnS+JrJFOWSpHnKsJAkNQ0tLJKsSfJokrv72g5Psj7Jfd33w7r2JLk0yeYkdyZZ0bfN2V3/+5KcPax6JUlTG+bI4nPAabu1rQJuqqplwE3da4A3Acu6r5XAZdALF+BCeneQnwhcOBkwkqS5M7SwqKpvAI/v1nw6cGW3fCVwRl/7VdVzM72p0I+kd+f4+qp6vKp+QG8iw90DSJI0ZHN9zmJRVW3rlr8HLOqWFwMP9fXb0rVN1f4sSVYm2ZBkw/bt22e3akma50Z2grv/yXuztL/Lq2qiqiYWLlw4W7uVJDH3YfFId3iJ7vujXftW4Oi+fku6tqnaJUlzaK7DYi0weUXT2cBX+9rf010VdRLww+5w1Q3AqUkO605sn9q1SZLm0KDPs5ixJF8CXgssSLKF3lVNq+k9z/t9wIPAO7vu64A3A5uBHwPnAFTV40n+ELi16/exqtr9pLkkaciGFhZVddYUq07ZQ98Czp1iP2uANbNYmiRphryDW5LUZFhIkpoMC0lS09DOWUj7itV3PDZQv1XLFwy5Eml0HFlIkpoMC0lSk2EhSWoyLCRJTYaFJKnJsJAkNRkWkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSpybCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUNJKwSPJAkruSbEyyoWs7PMn6JPd13w/r2pPk0iSbk9yZZMUoapak+WyUI4vXVdXxVTXRvV4F3FRVy4CbutcAbwKWdV8rgcvmvFJJmufG6TDU6cCV3fKVwBl97VdVz83AoUmOHEF9kjRvjSosCvjbJLclWdm1Laqqbd3y94BF3fJi4KG+bbd0bbtIsjLJhiQbtm/fPqy6JWle2n9E7/trVbU1yRHA+iTf6l9ZVZWkZrLDqrocuBxgYmJiRttKkqY3kpFFVW3tvj8KfAU4EXhk8vBS9/3RrvtW4Oi+zZd0bZKkOTLnYZHk4CSHTC4DpwJ3A2uBs7tuZwNf7ZbXAu/proo6Cfhh3+EqSdIcGMVhqEXAV5JMvv8Xq+pvktwKXJvkfcCDwDu7/uuANwObgR8D58x9yZI0v815WFTV/cAr99D+feCUPbQXcO4clCZJmsI4XTorSRpThoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lS06jmhpLUsPqOxwbqt2r5giFXIjmykCQNwLCQJDUZFpKkJsNCktRkWEiSmgwLSVKTYSFJajIsJElNhoUkqcmwkCQ1GRaSpCbDQpLUZFhIkpoMC0lSk2EhSWoyLCRJTT78SNJz5gOa5g9HFpKkJsNCktRkWEiSmgwLSVLTXnOCO8lpwCeB/YArqmr1iEuSNCKeWJ97e8XIIsl+wKeBNwHHAWclOW60VUnS/LG3jCxOBDZX1f0ASa4BTgfuGWlVkuaVQUc0sO+NalJVo66hKck7gNOq6v3d63cDr6qq8/r6rARWdi9fDnz7ebzlAmDwfxX7Nj+LXfl57MrPY6d94bP4hapauKcVe8vIoqmqLgcun419JdlQVROzsa+9nZ/Frvw8duXnsdO+/lnsFecsgK3A0X2vl3RtkqQ5sLeExa3AsiTHJHkhcCawdsQ1SdK8sVcchqqqHUnOA26gd+nsmqraNMS3nJXDWfsIP4td+Xnsys9jp336s9grTnBLkkZrbzkMJUkaIcNCktRkWPRJclqSbyfZnGTVqOsZpSRHJ/laknuSbEryoVHXNGpJ9ktyR5K/HHUto5bk0CTXJflWknuT/OqoaxqlJL/b/T+5O8mXkrx41DXNNsOi45Qiz7ID+HBVHQecBJw7zz8PgA8B9466iDHxSeBvqupY4JXM488lyWLgg8BEVf0yvYtwzhxtVbPPsNjpZ1OKVNXTwOSUIvNSVW2rqtu75Sfp/TJYPNqqRifJEuAtwBWjrmXUkvw88B+AzwJU1dNV9f9GWtTo7Q8cmGR/4CDg4RHXM+sMi50WAw/1vd7CPP7l2C/JUmA58M0RlzJKfwz8d+BfRlzHODgG2A78r+6w3BVJDh51UaNSVVuBTwDfBbYBP6yqvx1tVbPPsNC0krwE+DJwflU9Mep6RiHJW4FHq+q2UdcyJvYHVgCXVdVy4EfAvD3Hl+QwekchjgGOAg5O8q7RVjX7DIudnFJkN0kOoBcUV1fV9aOuZ4ROBt6W5AF6hydfn+QLoy1ppLYAW6pqcqR5Hb3wmK/+I/CdqtpeVT8Frgf+/YhrmnWGxU5OKdInSegdk763qi4edT2jVFUfqaolVbWU3r+Lv6uqfe4vx0FV1feAh5K8vGs6hfn9uIDvAiclOaj7f3MK++AJ/71iuo+5MIIpRcbdycC7gbuSbOzaLqiqdaMrSWPkvwJXd39Y3Q+cM+J6RqaqvpnkOuB2elcR3sE+OPWH031Ikpo8DCVJajIsJElNhoUkqcmwkCQ1GRaSpCbDQvNekmeSbOxmDf2nJB9O8oJu3USSS6fZdmmS35pm/VHdZZUkeW+ST82wtvcmOarv9RVO6KhR8D4LCZ6qquMBkhwBfBH4OeDCqtoAbJhm26XAb3Xb7CLJ/lX1MPCO51Hbe4G76Samq6r3P499Sc+ZIwupT1U9CqwEzkvPayefX5HkNd0IZGM3gd4hwGrg1V3b73YjgbVJ/g64qRt53N33Fkcn+fsk9yW5sNvvLn2S/F6SjyZ5BzBB7+a3jUkO7Lad6PqdleSu7hkKH+/b/p+TXNSNkm5OsmjYn5v2fYaFtJuqup/eXfxH7Lbq94Bzu1HIq4Gn6E2g93+r6viquqTrtwJ4R1W9Zg+7PxH4DeAVwG9O/uKfoo7r6I1qfrvb/1OT67pDUx8HXg8cD5yQ5Ixu9cHAzVX1SuAbwAcG/NGlKRkW0uD+Ebg4yQeBQ6tqxxT91lfV49Os+373i/964NeeYy0nAH/fTV63A7ia3jMmAJ4GJp/mdxu9Q2XS82JYSLtJ8lLgGeDR/vaqWg28HzgQ+Mckx06xix9Ns/vd59cpevMJ9f9ffL6P5Pxp7ZzH5xk8N6lZYFhIfZIsBP4U+FTtNnFakpdV1V1V9XF6sxQfCzwJHDKDt3hDksOTHAicQW+08ghwRJJ/leRFwFv7+k+1/1uA1yRZ0D0S+Czg6zOoQ5oR/+KQeo/D3AgcQO+v/M8De5qW/fwkr6P3tLxNwF93y88k+Sfgc8APGu91C71nhCwBvtBdbUWSj3XrtgLf6uv/OeBPkzwF/OpkY1VtS7IK+BoQ4K+q6quD/8jSzDjrrCSpycNQkqQmw0KS1GRYSJKaDAtJUpNhIUlqMiwkSU2GhSSp6f8DlluP0NVOzIwAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 평균 불량율이 2인 제품 생산시 불량이 1개(x = 1)발생할 확률\n",
"p = 0.002\n",
"n = 1000\n",
"l = n*p\n",
"rv = sp.stats.poisson(l)\n",
"\n",
"print(f\"f(1)={rv.pmf(1):.3f}, f(8)={rv.pmf(8):.3f}\")\n",
"print(f\"F(1)={rv.cdf(1):.3f}, F(8)={rv.cdf(8):.3f}\")\n",
"print(f\"E(X)={rv.mean():.3f}, V(X)={rv.var():.3f}\")\n",
"\n",
"# 포아송분포에서 실제 10000개 랜덤 샘플링 후 시각화\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, bins=30, kde=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 연속확률분포\n",
"\n",
"확률변수가 취할 수 있는 값이 연속적인 확률변수\n",
"\n",
"**확률밀도함수(Probability density function)**\n",
"\n",
"확률변수 X 가 취할 수 있는 값에 대한 집합 $X = {x_1, x_2, x_3, ...}$ 일때, 확률변수 X가 $x_k$를 취할 확률\n",
"\n",
"$$f(x) = P(x0 \\geq X \\leq x_1) = \\int_{x0}^{x1}f(x)dx $$\n",
"\n",
"**누적분포함수(Cumulative distribution function)**\n",
"\n",
"확률변수 X가 x이하가 될 때의 확률\n",
"\n",
"$$F(x) = P(X \\leq x) = \\int_{- \\infty}^{x}f(x)dx$$ \n",
"\n",
"**기대값(Expted value)**\n",
"\n",
"확률변수의 평균\n",
"\n",
"$$E(X) = \\int_{- \\infty}^{\\infty}xf(x)dx$$\n",
"\n",
"**분산(Variance)**\n",
"\n",
"$$V(X) = \\int_{- \\infty}^{\\infty}(x - E(X))^2f(x)dx$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 연속균등분포 (Continuous Uniform Distribution)\n",
"\n",
"사상이 일어나는 확률이 같은 분포, 확률변수가 x = $[\\alpha, \\beta]$ 사이에 연속적인 값을 취할때,\n",
"\n",
"\n",
"**확률밀도함수**\n",
"\n",
"$$f(x) = \\frac{1}{\\beta - \\alpha} \\quad (x \\in [\\alpha, \\beta])$$\n",
"$$f(x) = 0 \\quad (otherwise)$$\n",
"\n",
"**기대값과 분산**\n",
"\n",
"$$E(X) = \\frac{\\alpha + \\beta}{2} \\quad V(X) = \\frac{(\\beta - \\alpha)^2}{12}$$\n",
"\n",
"\n",
"\n",
"https://www.researchgate.net/figure/Uniform-Distribution-Types_fig3_319013233\n",
"\n",
"**Further Reading**\n",
"* [연속균등분포](https://ko.wikipedia.org/wiki/%EC%97%B0%EC%86%8D%EA%B7%A0%EB%93%B1%EB%B6%84%ED%8F%AC)\n",
"* [Continuous uniform distribution](https://en.wikipedia.org/wiki/Continuous_uniform_distribution)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution '), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"rv = sp.stats.uniform(loc=10, scale=10)\n",
"rvs = rv.rvs(size=100000)\n",
"ax = sns.distplot(rvs, bins=100, kde=True, hist=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution ', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 정규분포 (Normal Distribution)\n",
"\n",
"평균값을 중심으로 대칭을 이루는 종 모양의 분포, 자연에서 일어나는 많은 수의 데이터들이 정규분포를 따르는 경우가 많기 때문에 통계학에서 \n",
"가장 중요하게 다루는 분포입니다.\n",
"\n",
"**확률밀도함수**\n",
"\n",
"$$f(x) = \\frac{1}{\\sqrt{2\\pi\\sigma}}exp\\{-\\frac{(x-\\mu)^2}{2\\sigma^2}\\} \\quad (-\\infty < x < \\infty)$$\n",
"\n",
"**기대값과 분산**\n",
"\n",
"$$E(X) = \\mu \\quad V(X) = \\sigma^2$$\n",
"\n",
"**표준정규분포(Z Distribution)**\n",
"\n",
"표준졍규분포는 데이터의 평균값을 0으로 표준편차를 1로 변환하여 데이터의 척도에 상관 없이 사용할 수 있도록 표준화한 분포입니다. \n",
"\n",
"표준화변량 (Z-Score)\n",
"\n",
"$$Z = \\frac{x-\\mu}{\\sigma}$$\n",
"\n",
"표준화변량(Z-score)를 이용하면 평균과 표준편차가 다른 두 집단에 속한 데이터가 표준 정규 분포 상의 어느 확률 지점에 존재하는지 같은 기준을 통해 비교해 볼수 있게 됩니다. 예를 들면, 사내 어학연수 선발에서 토익성적만 있는 지원자와 토플성적만 있는 지원자 중 선발이 필요할 경우 직접적 비교가 어렵지만, 두 지원자의 점수를 Z-score로 표준화하여 같은 분포상에서 비교한다면 선발이 가능하게 됩니다.\n",
"\n",
"\n",
"\n",
"https://mathbitsnotebook.com/Algebra2/Statistics/STzScores.html\n",
"\n",
"**왜도와 첨도(Skew & Kurtosis)**\n",
"\n",
"표준분포의 모양이 정규분포에서 어느정도 벗어나는지를 측정하기 위한 지표\n",
"\n",
"왜도 (Skew)\n",
"\n",
"$$S_w = \\frac{1}{n}\\sum_{i=1}^{n}(\\frac{x_i-\\mu}{\\sigma})^3$$ \n",
"\n",
"첨조 (Kurtosis)\n",
"\n",
"$$S_k = \\frac{1}{n}\\sum_{i=1}^{n}(\\frac{x_i-\\mu}{\\sigma})^4 - 3$$ \n",
"\n",
"\n",
"\n",
"https://www.researchgate.net/figure/Illustration-of-the-skewness-and-kurtosis-values-and-how-they-correlate-with-the-shape-of_fig1_298415862"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Sepal.Length \n",
" Sepal.Width \n",
" Petal.Length \n",
" Petal.Width \n",
" \n",
" \n",
" \n",
" \n",
" 1 \n",
" -0.90068 \n",
" 1.01900 \n",
" -1.34023 \n",
" -1.31544 \n",
" \n",
" \n",
" 2 \n",
" -1.14302 \n",
" -0.13198 \n",
" -1.34023 \n",
" -1.31544 \n",
" \n",
" \n",
" 3 \n",
" -1.38535 \n",
" 0.32841 \n",
" -1.39706 \n",
" -1.31544 \n",
" \n",
" \n",
" 4 \n",
" -1.50652 \n",
" 0.09822 \n",
" -1.28339 \n",
" -1.31544 \n",
" \n",
" \n",
" 5 \n",
" -1.02185 \n",
" 1.24920 \n",
" -1.34023 \n",
" -1.31544 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width\n",
"1 -0.90068 1.01900 -1.34023 -1.31544\n",
"2 -1.14302 -0.13198 -1.34023 -1.31544\n",
"3 -1.38535 0.32841 -1.39706 -1.31544\n",
"4 -1.50652 0.09822 -1.28339 -1.31544\n",
"5 -1.02185 1.24920 -1.34023 -1.31544"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"Sepal.Length 0.31491\n",
"Sepal.Width 0.31897\n",
"Petal.Length -0.27488\n",
"Petal.Width -0.10297\n",
"dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"Sepal.Length -0.55206\n",
"Sepal.Width 0.22825\n",
"Petal.Length -1.40210\n",
"Petal.Width -1.34060\n",
"dtype: float64"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Scipy 를 이용한 각 변수의 Z-score 계산\n",
"#sp.stats.zscore(df_iris['Sepal.Length'])\n",
"\n",
"# Pandas에서 apply()에 zscore()를 전달하여 계산\n",
"df_zscore = df_iris.drop('Species', axis=1).apply(sp.stats.zscore)\n",
"display(df_zscore.head())\n",
"\n",
"# Pandas에서 Skew / Kurtosis 계산\n",
"display(df_iris.skew())\n",
"display(df_iris.kurtosis())"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"f(82)=0.071\n",
"F(82)=0.650\n",
"1 - F(82)=0.350\n",
"E(X)=80.000, V(X)=27.040\n",
"Top 10% point with ppf: 86.664\n",
"Top 10% point with isf: 86.664\n",
"중위 90% interval: 71.447 < x < 88.553\n",
"상하위 5% interval with ppf: x < 71.447, 88.553 < x\n",
"상하위 5% interval with isf: x < 71.447, 88.553 < x\n"
]
},
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 시험점수가 N(80, 5.2)(평균:80, 표준편차:5.2)인 정규분포에서 82점일 확률\n",
"mu = 80\n",
"std = 5.2\n",
"# scipy 의 statistics 패키지를 이용하여 정규분포에서 원하는 값 계산\n",
"# https://docs.scipy.org/doc/scipy/tutorial/stats.html\n",
"rv = sp.stats.norm(loc=mu, scale=std)\n",
"\n",
"# 확률밀도함수(Probability density function)\n",
"print(f\"f(82)={rv.pdf(82.0):.3f}\") \n",
"# 누적분포함수(Cumulative distribution function): P(X <= 82): 82점 이하일 확률\n",
"print(f\"F(82)={rv.cdf(82.0):.3f}\")\n",
"# Survival Function (1-CDF): P(X >= 82): 82점 이상일 확률\n",
"print(f\"1 - F(82)={rv.sf(82.0):.3f}\")\n",
"# 기대값(Expted value), 분산 (Variance)\n",
"print(f\"E(X)={rv.mean():.3f}, V(X)={rv.var():.3f}\")\n",
"\n",
"# 상위 % 를 가지는 값 구하기 \n",
"# Percent Point Function (Inverse of CDF): P(X <= x) = 90% 일때, x의 값\n",
"print(f\"Top 10% point with ppf: {rv.ppf(0.90):.3f}\")\n",
"# Inverse Survival Function (Inverse of SF): P(X >= x) = 10% 일때, x의 값\n",
"print(f\"Top 10% point with isf: {rv.isf(0.10):.3f}\")\n",
"# 중위 90% 확률을 가지는 구간\n",
"print(f\"중위 90% interval: {rv.interval(0.9)[0]:.3f} < x < {rv.interval(0.9)[1]:.3f}\")\n",
"# 상위 5%, 하위 5% 점수 구간\n",
"print(f\"상하위 5% interval with ppf: x < {rv.ppf(0.05):.3f}, {rv.ppf(0.95):.3f} < x\")\n",
"print(f\"상하위 5% interval with isf: x < {rv.isf(0.95):.3f}, {rv.isf(0.05):.3f} < x\")\n",
"\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, bins=100, kde=True, hist=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 표준정규분포 (Z-distribution) N(0,1)\n",
"rv = sp.stats.norm(loc=0, scale=1)\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, bins=100, kde=True, hist=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Q-Q Plot (Quantile-Quantile Plot)**\n",
"\n",
"정규분포인지를 살펴보기 위한 다른 시각화 방식은 Q-Q Plot 입니다. Q-Q Plot 은 정규분포와 실제 데이터의 같은 Quantile 상의 데이터를 샘플링해 point를 찍어 봄으로써 두 분포가 유사한지를 시각화하여 판단하는 방식입니다. point 들이 직선을 이룬다면 두 분포는 같은 정규분포일 가능성이 높습니다.\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from statsmodels.graphics.gofplots import qqplot\n",
"\n",
"# q-q plot\n",
"qqplot(df_iris['Sepal.Length'], line='s')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 지수분포 (Exponential Distribution)\n",
"\n",
"어떤 사건이 발생하는 간격(시간)이 따르는 분포\n",
"\n",
"**확률밀도함수**\n",
"\n",
"단위 시간당 평균 $\\lambda$번 발생하는 사건의 발생 간격을 따를 때, \n",
"\n",
"$$f(x) = \\lambda e^{-\\lambda x} \\quad (x \\geq 0)$$\n",
"\n",
"**기대값과 분산**\n",
"\n",
"$$E(X) = \\frac{1}{\\lambda} \\quad V(X) = \\frac{1}{\\lambda^2}$$\n",
"\n",
"\n",
"\n",
"\n",
"https://en.wikipedia.org/wiki/Exponential_distribution\n",
"\n",
"**Further Reading**\n",
"* [Exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"f(1)=0.271, f(3)=0.005\n",
"F(1)=0.865, F(3)=0.998\n",
"E(X)=0.500, V(X)=0.250\n"
]
},
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# 한달 평균 2건의 교통사고가 발생하는 지역에서 한달 교통사고 발생 간격\n",
"l = 2\n",
"rv = sp.stats.expon(scale=1.0/l)\n",
"\n",
"print(f\"f(1)={rv.pdf(1):.3f}, f(3)={rv.pdf(3):.3f}\")\n",
"# 1달안에 교통사고가 발생할 확률, 3달안에 교통사고가 발생할 확률\n",
"print(f\"F(1)={rv.cdf(1):.3f}, F(3)={rv.cdf(3):.3f}\")\n",
"print(f\"E(X)={rv.mean():.3f}, V(X)={rv.var():.3f}\")\n",
"\n",
"# 랜덤 샘플링 후 시각화\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, bins=30, kde=True, hist=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 카이제곱 분포 (Chi-squared($x^2$) Distribution)\n",
"\n",
"추정과 검정에 사용되는 특수한 확률분포 중 하나, 분산의 구간 추정이나 독립성 검정에 사용\n",
"\n",
"\n",
"\n",
"https://ko.wikipedia.org/wiki/%EC%B9%B4%EC%9D%B4%EC%A0%9C%EA%B3%B1_%EB%B6%84%ED%8F%AC\n",
"\n",
"**Further Reading**\n",
"* [카이제곱 분포](https://ko.wikipedia.org/wiki/%EC%B9%B4%EC%9D%B4%EC%A0%9C%EA%B3%B1_%EB%B6%84%ED%8F%AC)\n",
"* [Chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Chi-squared distribution degree of freedom = 3\n",
"rv = sp.stats.chi2(df=3)\n",
"\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, bins=100, kde=True, hist=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### T 분포 (T Distribution)\n",
"\n",
"추정과 검정에 사용되는 특수한 확률분포 중 하나, 모분산을 알 수 없고, 표본 크기가 작을 때 정규분포 Z값 대신 T 분포의 T값을 이용해 추정 또는 검정에 사용 \n",
"\n",
"$$t = \\frac{\\bar{x} - \\mu}{\\frac{s}{\\sqrt{n}}}$$\n",
"\n",
"\n",
"\n",
"https://en.wikipedia.org/wiki/Student%27s_t-distribution\n",
"\n",
"**Further Reading**\n",
"* [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution)\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# T-distribution degree of freedom = 3\n",
"rv = sp.stats.t(df=3)\n",
"\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, bins=100, kde=True, hist=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# T-distribution degree of freedom = 30, 정규분포와 가까워짐\n",
"rv = sp.stats.t(df=30)\n",
"\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, bins=100, kde=True, hist=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### F 분포 (F Distribution)\n",
"\n",
"분산분석 등에서 사용되는 확률 분포\n",
"\n",
"확률변수 $Y_1$, $Y_2$는 서로 독립이고, 각 카이제곱분포를 따를 때($Y_1 \\sim x^2(n_1)$, $Y_2 \\sim x^2(n_2)$),\n",
"\n",
"$$F = \\frac{\\frac{Y_1}{n_1}}{\\frac{Y_2}{n_2}}$$\n",
"\n",
"의 확률분포를 자유도가 $n_1$, $n_2$인 F분포 $F(n_1, n_2)$"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Text(0.5, 0, 'Distribution'), Text(0, 0.5, 'Frequency')]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"rv = sp.stats.f(5, 10)\n",
"\n",
"rvs = rv.rvs(size=10000)\n",
"ax = sns.distplot(rvs, bins=100, kde=True, hist=False,\n",
" color='skyblue', hist_kws={\"linewidth\": 15,'alpha':1})\n",
"ax.set(xlabel='Distribution', ylabel='Frequency')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.9"
},
"nikola": {
"category": "",
"date": "2019-02-24",
"description": "",
"link": "",
"slug": "ml-descriptive-statistics",
"tags": "",
"title": "Machine Learning - Descriptive Statistics",
"type": "text"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}