# 9.7. `statistics` — 数学统计函数¶

3.4 新版功能.

## 9.7.1. 平均值以及对中心位置的评估¶

 `mean()` 数据的算术平均数（“平均数”）。 `harmonic_mean()` 数据的调和均值 `median()` 数据的中位数（中间值） `median_low()` 数据的低中位数 `median_high()` 数据的高中位数 `median_grouped()` 分组数据的中位数，即第50个百分点。 `mode()` Mode (most common value) of discrete data.

## 9.7.2. 对分散程度的评估¶

 `pstdev()` 数据的总体标准差 `pvariance()` 数据的总体方差 `stdev()` 数据的样本标准差 `variance()` 数据的样本方差

## 9.7.3. 函数细节¶

`statistics.``mean`(data)

data 为空，将会引发 `StatisticsError`

```>>> mean([1, 2, 3, 4, 4])
2.8
>>> mean([-1.0, 2.5, 3.25, 5.75])
2.625

>>> from fractions import Fraction as F
>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
Fraction(13, 21)

>>> from decimal import Decimal as D
>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
Decimal('0.5625')
```

The mean is strongly affected by outliers and is not a robust estimator for central location: the mean is not necessarily a typical example of the data points. For more robust, although less efficient, measures of central location, see `median()` and `mode()`. (In this case, “efficient” refers to statistical efficiency rather than computational efficiency.)

The sample mean gives an unbiased estimate of the true population mean, which means that, taken on average over all the possible samples, `mean(sample)` converges on the true mean of the entire population. If data represents the entire population rather than a sample, then `mean(data)` is equivalent to calculating the true population mean μ.

`statistics.``harmonic_mean`(data)

The harmonic mean is a type of average, a measure of the central location of the data. It is often appropriate when averaging quantities which are rates or ratios, for example speeds. For example:

```>>> harmonic_mean([2.5, 3, 10])  # For an equal investment portfolio.
3.6
```

Using the arithmetic mean would give an average of about 5.167, which is too high.

3.6 新版功能.

`statistics.``median`(data)

The median is a robust measure of central location, and is less affected by the presence of outliers in your data. When the number of data points is odd, the middle data point is returned:

```>>> median([1, 3, 5])
3
```

```>>> median([1, 3, 5, 7])
4.0
```

If your data is ordinal (supports order operations) but not numeric (doesn’t support addition), you should use `median_low()` or `median_high()` instead.

`statistics.``median_low`(data)

Return the low median of numeric data. If data is empty, `StatisticsError` is raised. data can be a sequence or iterator.

```>>> median_low([1, 3, 5])
3
>>> median_low([1, 3, 5, 7])
3
```

`statistics.``median_high`(data)

Return the high median of data. If data is empty, `StatisticsError` is raised. data can be a sequence or iterator.

```>>> median_high([1, 3, 5])
3
>>> median_high([1, 3, 5, 7])
5
```

`statistics.``median_grouped`(data, interval=1)

Return the median of grouped continuous data, calculated as the 50th percentile, using interpolation. If data is empty, `StatisticsError` is raised. data can be a sequence or iterator.

```>>> median_grouped([52, 52, 53, 54])
52.5
```

```>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
3.7
```

```>>> median_grouped([1, 3, 3, 5, 7], interval=1)
3.25
>>> median_grouped([1, 3, 3, 5, 7], interval=2)
3.5
```

CPython implementation detail: 在某些情况下，`median_grouped()` 可以会将数据点强制转换为浮点数。 此行为在未来有可能会发生改变。

• “Statistics for the Behavioral Sciences”, Frederick J Gravetter and Larry B Wallnau (8th Edition).

• Calculating the median.

• Gnome Gnumeric 电子表格中的 SSMEDIAN 函数，包括 这篇讨论

`statistics.``mode`(data)

Return the most common data point from discrete or nominal data. The mode (when it exists) is the most typical value, and is a robust measure of central location.

If data is empty, or if there is not exactly one most common value, `StatisticsError` is raised.

`mode` assumes discrete data, and returns a single value. This is the standard treatment of the mode as commonly taught in schools:

```>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
3
```

The mode is unique in that it is the only statistic which also applies to nominal (non-numeric) data:

```>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
```
`statistics.``pstdev`(data, mu=None)

```>>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
0.986893273527251
```
`statistics.``pvariance`(data, mu=None)

Return the population variance of data, a non-empty iterable of real-valued numbers. Variance, or second moment about the mean, is a measure of the variability (spread or dispersion) of data. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean.

If the optional second argument mu is given, it should be the mean of data. If it is missing or `None` (the default), the mean is automatically calculated.

```>>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
>>> pvariance(data)
1.25
```

```>>> mu = mean(data)
>>> pvariance(data, mu)
1.25
```

This function does not attempt to verify that you have passed the actual mean as mu. Using arbitrary values for mu may lead to invalid or impossible results.

```>>> from decimal import Decimal as D
>>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
Decimal('24.815')

>>> from fractions import Fraction as F
>>> pvariance([F(1, 4), F(5, 4), F(1, 2)])
Fraction(13, 72)
```

If you somehow know the true population mean μ, you may use this function to calculate the variance of a sample, giving the known population mean as the second argument. Provided the data points are representative (e.g. independent and identically distributed), the result will be an unbiased estimate of the population variance.

`statistics.``stdev`(data, xbar=None)

```>>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
1.0810874155219827
```
`statistics.``variance`(data, xbar=None)

```>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
>>> variance(data)
1.3720238095238095
```

```>>> m = mean(data)
>>> variance(data, m)
1.3720238095238095
```

```>>> from decimal import Decimal as D
>>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
Decimal('31.01875')

>>> from fractions import Fraction as F
>>> variance([F(1, 6), F(1, 2), F(5, 3)])
Fraction(67, 108)
```

## 9.7.4. 异常¶

exception `statistics.``StatisticsError`

`ValueError` 的子类，表示统计相关的异常。