# HistogramThresholding.jl Documentation

`HistogramThresholding.Otsu`

— Type```
t = find_threshold(histogram, edges, Otsu())
t = find_threshold(img, Otsu(); nbins = 256)
```

Under the assumption that the histogram is bimodal the threshold is set so that the resultant between-class variance is maximal.

**Output**

Returns a real number `t`

in `edges`

. The `edges`

parameter represents an `AbstractRange`

which specifies the intervals associated with the histogram bins.

**Extended help**

**Details**

Let $f_i$ $(i=1 \ldots I)$ denote the number of observations in the $i$th bin of the histogram. Then the probability that an observation belongs to the $i$th bin is given by $p_i = \frac{f_i}{N}$ ($i = 1, \ldots, I$), where $N = \sum_{i=1}^{I}f_i$.

The choice of a threshold $T$ partitions the data into two categories, $C_0$ and $C_1$. Let

\[P_0(T) = \sum_{i = 1}^T p_i \quad \text{and} \quad P_1(T) = \sum_{i = T+1}^I p_i\]

denote the cumulative probabilities,

\[\mu_0(T) = \sum_{i = 1}^T i \frac{p_i}{P_0(T)} \quad \text{and} \quad \mu_1(T) = \sum_{i = T+1}^I i \frac{p_i}{P_1(T)}\]

denote the means, and

\[\sigma_0^2(T) = \sum_{i = 1}^T (i-\mu_0(T))^2 \frac{p_i}{P_0(T)} \quad \text{and} \quad \sigma_1^2(T) = \sum_{i = T+1}^I (i-\mu_1(T))^2 \frac{p_i}{P_1(T)}\]

denote the variances of categories $C_0$ and $C_1$, respectively. Furthermore, let

\[\mu = P_0(T)\mu_0(T) + P_1(T)\mu_1(T),\]

represent the overall mean,

\[\sigma_b^2(T) = P_0(T)(\mu_0(T) - \mu)^2 + P_1(T)(\mu_1(T) - \mu)^2,\]

the between-category variance, and

\[\sigma_w^2(T) = P_0(T) \sigma_0^2(T) + P_1(T)\sigma_1^2(T)\]

the within-category variance, respectively.

Finding the discrete value $T$ which maximises the function $\sigma_b^2(T)$ produces the sought-after threshold value (i.e. the bin which determines the threshold). As it turns out, that threshold value is equal to the threshold decided by minimizing the within-category variances criterion $\sigma_w^2(T)$. Furthermore, that threshold is also the same as the threshold calculated by maximizing the ratio of between-category variance to within-category variance.

**Arguments**

The function arguments are described in more detail below.

`histogram`

An `AbstractArray`

storing the frequency distribution.

`edges`

An `AbstractRange`

specifying how the intervals for the frequency distribution are divided.

**Example**

Compute the threshold for the "cameraman" image in the `TestImages`

package.

```
using TestImages, ImageContrastAdjustment, HistogramThresholding
img = testimage("cameraman")
edges, counts = build_histogram(img,256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
t = find_threshold(counts[1:end], edges, Otsu())
```

**Reference**

- Nobuyuki Otsu (1979). “A threshold selection method from gray-level histograms”.
*IEEE Trans. Sys., Man., Cyber.*9 (1): 62–66. doi:10.1109/TSMC.1979.4310076

`HistogramThresholding.MinimumIntermodes`

— Type```
t = find_threshold(histogram, edges, Minimum(); maxiter = 8000)
t = find_threshold(img, Minimum(); nbins = 256)
```

Under the assumption that the histogram is bimodal the histogram is smoothed using a length-3 mean filter until two modes remain. The threshold is then set to the minimum value between the two modes.

**Output**

Returns a real number `t`

in `edges`

. The `edges`

parameter represents an `AbstractRange`

which specifies the intervals associated with the histogram bins.

**Extended help**

**Details**

If after `maxiter`

iterations the smoothed histogram is still not bimodal then the algorithm will fall back to using the `UnimodalRosin`

method to select a threshold.

**Arguments**

The function arguments are described in more detail below.

`histogram`

An `AbstractArray`

storing the frequency distribution.

`edges`

An `AbstractRange`

specifying how the intervals for the frequency distribution are divided.

`maxiter`

An `Int`

that specifies the maximum number of smoothing iterations. If left unspecified a default value of 8000 is used.

**Example**

Compute the threshold for the "cameraman" image in the `TestImages`

package.

```
using TestImages, ImageContrastAdjustment, HistogramThresholding
img = testimage("cameraman")
edges, counts = build_histogram(img,256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
t = find_threshold(counts[1:end], edges, MinimumIntermodes())
```

**Reference**

- C. A. Glasbey, “An Analysis of Histogram-Based Thresholding Algorithms,”
*CVGIP: Graphical Models and Image Processing*, vol. 55, no. 6, pp. 532–537, Nov. 1993. doi:10.1006/cgip.1993.1040 - J. M. S. Prewitt and M. L. Mendelsohn, “THE ANALYSIS OF CELL IMAGES
*,” *Annals of the New York Academy of Sciences*, vol. 128, no. 3, pp. 1035–1053, Dec. 2006. doi:10.1111/j.1749-6632.1965.tb11715.x

`HistogramThresholding.Intermodes`

— Type```
t = find_threshold(histogram, edges, Intermodes(maxiter=8000))
t = find_threshold(img, Intermodes(); nbins = 256)
```

Under the assumption that the histogram is bimodal the histogram is smoothed using a length-3 mean filter until two modes remain. The threshold is then set to the average value of the two modes.

**Output**

Returns a real number `t`

in `edges`

. The `edges`

parameter represents an `AbstractRange`

which specifies the intervals associated with the histogram bins.

**Extended help**

**Details**

If after `maxiter`

iterations the smoothed histogram is still not bimodal then the algorithm will fall back to using the `UnimodalRosin`

method to select a threshold.

**Arguments**

The function arguments are described in more detail below.

`histogram`

An `AbstractArray`

storing the frequency distribution.

`edges`

An `AbstractRange`

specifying how the intervals for the frequency distribution are divided.

`maxiter`

An `Int`

that specifies the maximum number of smoothing iterations. If left unspecified a default value of 1000 is used.

**Example**

Compute the threshold for the "cameraman" image in the `TestImages`

package.

```
using TestImages, ImageContrastAdjustment, HistogramThresholding
img = testimage("cameraman")
edges, counts = build_histogram(img,256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
t = find_threshold(counts[1:end], edges, Intermodes())
```

**Reference**

- C. A. Glasbey, “An Analysis of Histogram-Based Thresholding Algorithms,” CVGIP: Graphical Models and Image Processing, vol. 55, no. 6, pp. 532–537, Nov. 1993. doi:10.1006/cgip.1993.1040

`HistogramThresholding.MinimumError`

— Type```
t = find_threshold(histogram, edges, MinimumError())
t = find_threshold(img, MinimumError(); nbins = 256)
```

Under the assumption that the histogram is a mixture of two Gaussian distributions the threshold is chosen such that the expected misclassification error rate is minimised.

**Output**

`t`

in `edges`

. The `edges`

parameter represents an `AbstractRange`

which specifies the intervals associated with the histogram bins.

**Extended help**

**Details**

Let $f_i$ $(i=1 \ldots I)$ denote the number of observations in the $i$th bin of the histogram. Then the probability that an observation belongs to the $i$th bin is given by $p_i = \frac{f_i}{N}$ ($i = 1, \ldots, I$), where $N = \sum_{i=1}^{I}f_i$.

The minimum error thresholding method assumes that one can find a threshold $T$ which partitions the data into two categories, $C_0$ and $C_1$, such that the data can be modelled by a mixture of two Gaussian distribution. Let

\[P_0(T) = \sum_{i = 1}^T p_i \quad \text{and} \quad P_1(T) = \sum_{i = T+1}^I p_i\]

denote the cumulative probabilities,

\[\mu_0(T) = \sum_{i = 1}^T i \frac{p_i}{P_0(T)} \quad \text{and} \quad \mu_1(T) = \sum_{i = T+1}^I i \frac{p_i}{P_1(T)}\]

denote the means, and

\[\sigma_0^2(T) = \sum_{i = 1}^T (i-\mu_0(T))^2 \frac{p_i}{P_0(T)} \quad \text{and} \quad \sigma_1^2(T) = \sum_{i = T+1}^I (i-\mu_1(T))^2 \frac{p_i}{P_1(T)}\]

denote the variances of categories $C_0$ and $C_1$, respectively.

Kittler and Illingworth proposed to use the minimum error criterion function

\[J(T) = 1 + 2 \left[ P_0(T) \ln \sigma_0(T) + P_1(T) \ln \sigma_1(T) \right] - 2 \left[P_0(T) \ln P_0(T) + P_1(T) \ln P_1(T) \right]\]

to assess the discreprancy between the mixture of Gaussians implied by a particular threshold $T$, and the piecewise-constant probability density function represented by the histogram. The discrete value $T$ which minimizes the function $J(T)$ produces the sought-after threshold value (i.e. the bin which determines the threshold).

**Arguments**

The function arguments are described in more detail below.

`histogram`

An `AbstractArray`

storing the frequency distribution.

`edges`

An `AbstractRange`

specifying how the intervals for the frequency distribution are divided.

**Example**

Compute the threshold for the "cameraman" image in the `TestImages`

package.

```
using TestImages, ImageContrastAdjustment, HistogramThresholding
img = testimage("cameraman")
edges, counts = build_histogram(img,256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
t = find_threshold(counts[1:end], edges, MinimumError())
```

**References**

- J. Kittler and J. Illingworth, “Minimum error thresholding,” Pattern Recognition, vol. 19, no. 1, pp. 41–47, Jan. 1986. doi:10.1016/0031-3203(86)90030-0
- Q.-Z. Ye and P.-E. Danielsson, “On minimum error thresholding and its implementations,” Pattern Recognition Letters, vol. 7, no. 4, pp. 201–206, Apr. 1988. doi:10.1016/0167-8655(88)90103-1

`HistogramThresholding.Moments`

— Type```
t = find_threshold(histogram, edges, Moments())
t = find_threshold(img, Moments(); nbins = 256)
```

The following rule determines the threshold: if one assigns all observations below the threshold to a value z₀ and all observations above the threshold to a value z₁, then the first three moments of the original histogram must match the moments of this specially constructed bilevel histogram.

**Output**

`t`

in `edges`

. The `edges`

parameter represents an `AbstractRange`

which specifies the intervals associated with the histogram bins.

**Extended help**

**Details**

Let $f_i$ $(i=1 \ldots I)$ denote the number of observations in the $i$th bin of the histogram and $z_i$ $(i=1 \ldots I)$ the observed value associated with the $i$th bin. Then the probability that an observation $z_i$ belongs to the $i$th bin is given by $p_i = \frac{f_i}{N}$ ($i = 1, \ldots, I$), where $N = \sum_{i=1}^{I}f_i$.

Moments can be computed from the histogram $f$ in the following way:

\[m_k = \frac{1}{N} \sum_i p_i (z_i)^k \quad k = 0,1,2,3, \ldots.\]

The principle of moment-preserving thresholding is to select a threshold value, as well as two representative values $z_0$ and $z_1$ ($z_0 < z_1$), such that if all below-threshold values in $f$ are replaced by $z_0$ and all above-threshold values are replaced by $z_1$, then this specially constructed bilevel histogram $g$ will have the same first three moments as $f$.

Concretely, let $q_0$ and $q_1$ denote the fractions of observations below and above the threshold in $f$, respectively. The constraint that the first three moments in $g$ must equal the first three moments in $f$ can be expressed by the following system of four equations

\[\begin{aligned} q_0 (z_0)^0 + q_1 (z_1)^0 & = m_0 \\ q_0 (z_0)^1 + q_1 (z_1)^1 & = m_1 \\ q_0 (z_0)^2 + q_1 (z_1)^2 & = m_2 \\ q_0 (z_0)^3 + q_1 (z_1)^3 & = m_3 \\ \end{aligned}\]

where the left-hand side represents the moments of $g$ and the right-hand side represents the moments of $f$. To find the desired treshold value, one first solves the four equations to obtain $q_0$ and $q_1$, and then chooses the threshold $t$ such that $q_0 = \sum_{z_i \le t} p_i$.

**Arguments**

The function arguments are described in more detail below.

`histogram`

An `AbstractArray`

storing the frequency distribution.

`edges`

An `AbstractRange`

specifying how the intervals for the frequency distribution are divided.

**Example**

Compute the threshold for the "cameraman" image in the `TestImages`

package.

```
using TestImages, ImageContrastAdjustment, HistogramThresholding
img = testimage("cameraman")
edges, counts = build_histogram(img,256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
t = find_threshold(counts[1:end], edges, Moments())
```

**Reference**

[1] W.-H. Tsai, “Moment-preserving thresolding: A new approach,” Computer Vision, Graphics, and Image Processing, vol. 29, no. 3, pp. 377–393, Mar. 1985. doi:10.1016/0734-189x(85)90133-1

`HistogramThresholding.UnimodalRosin`

— Type```
t = find_threshold(histogram, edges, UnimodalRosin())
t = find_threshold(img, UnimodalRosin(); nbins = 256)
```

Generates a threshold assuming a unimodal distribution using Rosin's algorithm.

**Output**

`t`

in `edges`

. The `edges`

parameter represents an `AbstractRange`

which specifies the intervals associated with the histogram bins.

**Extended help**

**Details**

This algorithm first selects the bin in the histogram with the highest frequency. The algorithm then searches from the location of the maximum bin to the last bin of the histogram for the first bin with a frequency of 0 (known as the minimum bin.). A line is then drawn that passes through both the maximum and minimum bins. The bin with the greatest orthogonal distance to the line is chosen as the threshold value.

**Assumptions**

This algorithm assumes that:

- The histogram is unimodal.
- There is always at least one bin that has a frequency of 0. If not, the algorithm will use the last bin as the minimum bin.

If the histogram includes multiple bins with a frequency of 0, the algorithm will select the first zero bin as its minimum. If there are multiple bins with the greatest orthogonal distance, the leftmost bin is selected as the threshold.

**Arguments**

The function arguments are described in more detail below.

`histogram`

An `AbstractArray`

storing the frequency distribution.

`edges`

An `AbstractRange`

specifying how the intervals for the frequency distribution are divided.

**Example**

Compute the threshold for the "moonsurface" image in the `TestImages`

package.

```
using TestImages, ImageContrastAdjustment, HistogramThresholding
img = testimage("moonsurface")
edges, counts = build_histogram(img,256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
t = find_threshold(counts[1:end], edges, UnimodalRosin())
```

**Reference**

- P. L. Rosin, “Unimodal thresholding,” Pattern Recognition, vol. 34, no. 11, pp. 2083–2096, Nov. 2001.doi:10.1016/s0031-3203(00)00136-9

`HistogramThresholding.Entropy`

— Type```
find_threshold(counts, edges, Entropy())
t = find_threshold(img, Entropy(); nbins = 256)
```

An algorithm for finding the threshold value for a gray-level histogram using the entropy of the histogram.

**Output**

Returns the point in the `AbstractRange`

which corresponds to the threshold bin in the histogram.

**Extended help**

**Details**

This algorithm uses the entropy of a gray level histogram to produce a threshold value.

Let $f_1, f_2, \ldots, f_I$ be the frequencies in the various bins of the histogram and $I$ the number of bins. With $N = \sum_{i=1}^{I}f_i$, let $p_i = \frac{f_i}{N}$ ($i = 1, \ldots, I$) denote the probability distribution of gray levels. From this distribution one derives two additional distributions. The first defined for discrete values $1$ to $s$ and the other, from $s+1$ to $I$. These distributions are

\[A: \frac{p_1}{P_s}, \frac{p_2}{P_s}, \ldots, \frac{p_s}{P_s} \quad \text{and} \quad B: \frac{p_{s+1}}{1-P_s}, \ldots, \frac{p_n}{1-P_s} \quad \text{where} \quad P_s = \sum_{i=1}^{s}p_i.\]

The entropies associated with each distribution are as follows:

\[H(A) = \ln(P_s) + \frac{H_s}{P_s}\]

\[H(B) = \ln(1-P_s) + \frac{H_n-H_s}{1-P_s}\]

\[\quad \text{where} \quad H_s = -\sum_{i=1}^{s}p_i\ln{p_i} \quad \text{and} \quad H_n = -\sum_{i=1}^{I}p_i\ln{p_i}.\]

Combining these two entropy functions we have

\[\psi(s) = \ln(P_s(1-P_s)) + \frac{H_s}{P_s} + \frac{H_n-H_s}{1-P_s}.\]

Finding the discrete value $s$ which maximises the function $\psi(s)$ produces the sought-after threshold value (i.e. the bin which determines the threshold).

See Section 4 of [1] for more details on the derivation of the entropy.

**Options**

**Choices for counts**

You can specify an `AbstractArray`

which should be a 1D array of frequencies for a histogram. You should submit the corresponding `edges`

range for the bins of the histogram. The function will throw an error if it detects that the `edges`

and `counts`

have different lengths.

**Choices for edges**

You can specify an `AbstractRange`

which should be the corresponding range for the bins of the histogram array passed into `counts`

.

**Example**

```
using TestImages, Images
img = testimage("cameraman")
# building a histogram with 256 bins
edges, counts = build_histogram(img, 256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
find_threshold(counts[1:end], edges, Entropy())
```

**References**

[1] J. N. Kapur, P. K. Sahoo, and A. K. C. Wong, “A new method for gray-level picture thresholding using the entropy of the histogram,” *Computer Vision, Graphics, and Image Processing*, vol. 29, no. 1, p. 140, Jan. 1985.doi:10.1016/s0734-189x(85)90156-2

`HistogramThresholding.Balanced`

— Type```
t = find_threshold(histogram, edges, Balanced())
t = find_threshold(img, Balanced(); nbins = 256)
```

In balanced histogram thresholding, one interprets a bin as a physical weight with a mass equal to its occupancy count. The balanced histogram method involves iterating the following three steps: (1) choose the midpoint bin index as a "pivot", (2) compute the combined weight to the left and right of the pivot bin and (3) remove the leftmost bin if the left side is the heaviest, and the rightmost bin otherwise. The algorithm stops when only a single bin remains. The last bin determines the sought-after threshold.

**Output**

`t`

in `edges`

. The `edges`

parameter represents an `AbstractRange`

which specifies the intervals associated with the histogram bins.

**Extended help**

**Details**

Let $f_n$ ($n = 1 \ldots N$) denote the number of observations in the $n$th bin of the histogram. The balanced histogram method constructs a sequence of nested intervals

\[[1,N] \cap \mathbb{Z} \supset I_2 \supset I_3 \supset \ldots \supset I_{N-1},\]

where for $k = 2 \ldots N-1$

\[I_k = \begin{cases} I_{k-1} \setminus \{\min \left( I_{k-1} \right) \} &\text{if } \sum_{n = \min \left( I_{k-1} \right)}^{I_m}f_n \gt \sum_{n = I_m + 1}^{ \max \left( I_{k-1} \right)} f_n, \\ I_{k-1} \setminus \{\max \left( I_{k-1} \right) \} &\text{otherwise}, \end{cases}\]

and $I_m = \lfloor \frac{1}{2}\left( \min \left( I_{k-1} \right) + \max \left( I_{k-1} \right) \right) \rfloor$. The final interval $I_{N-1}$ consists of a single element which is the bin index corresponding to the desired threshold.

If one interprets a bin as a physical weight with a mass equal to its occupancy count, then each step of the algorithm can be conceptualised as removing the leftmost or rightmost bin to "balance" the resulting histogram on a pivot. The pivot is defined to be the midpoint between the start and end points of the interval under consideration.

If it turns out that the single element in $I_{N-1}$ equals $1$ or $N$ then the original histogram must have a single peak and the algorithm has failed to find a suitable threshold. In this case the algorithm will fall back to using the `UnimodalRosin`

method to select the threshold.

**Arguments**

The function arguments are described in more detail below.

`histogram`

An `AbstractArray`

storing the frequency distribution.

`edges`

An `AbstractRange`

specifying how the intervals for the frequency distribution are divided.

**Example**

Compute the threshold for the "cameraman" image in the `TestImages`

package.

```
using TestImages, ImageContrastAdjustment, HistogramThresholding
img = testimage("cameraman")
edges, counts = build_histogram(img, 256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
t = find_threshold(counts[1:end], edges, Balanced())
```

**Reference**

- “BI-LEVEL IMAGE THRESHOLDING - A Fast Method”, Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing, 2008. Available: 10.5220/0001064300700076

`HistogramThresholding.Yen`

— Type```
t = find_threshold(histogram, edges, Yen())
t = find_threshold(img, Yen(); nbins = 256)
```

Computes the threshold value using Yen's maximum correlation criterion for bilevel thresholding.

**Output**

`t`

in `edges`

. The `edges`

parameter represents an `AbstractRange`

which specifies the intervals associated with the histogram bins.

**Extended help**

**Details**

This algorithm uses the concept of *entropic correlation* of a gray level histogram to produce a threshold value.

Let $f_1, f_2, \ldots, f_I$ be the frequencies in the various bins of the histogram and $I$ the number of bins. With $N = \sum_{i=1}^{I}f_i$, let $p_i = \frac{f_i}{N}$ ($i = 1, \ldots, I$) denote the probability distribution of gray levels. From this distribution one derives two additional distributions. The first defined for discrete values $1$ to $s$ and the other, from $s+1$ to $I$. These distributions are

\[A: \frac{p_1}{P_s}, \frac{p_2}{P_s}, \ldots, \frac{p_s}{P_s} \quad \text{and} \quad B: \frac{p_{s+1}}{1-P_s}, \ldots, \frac{p_n}{1-P_s} \quad \text{where} \quad P_s = \sum_{i=1}^{s}p_i.\]

The entropic correlations associated with each distribution are

\[C(A) = -\ln \sum_{i=1}^{s} \left( \frac{p_i}{P_s} \right)^2 \quad \text{and} \quad C(B) = -\ln \sum_{i=s+1}^{I} \left( \frac{p_i}{1 - P_s} \right)^2.\]

Combining these two entropic correlation functions we have

\[\psi(s) = -\ln \sum_{i=1}^{s} \left( \frac{p_i}{P_s} \right)^2 -\ln \sum_{i=s+1}^{I} \left( \frac{p_i}{1 - P_s} \right)^2.\]

Finding the discrete value $s$ which maximises the function $\psi(s)$ produces the sought-after threshold value (i.e. the bin which determines the threshold).

**Arguments**

The function arguments are described in more detail below.

`histogram`

An `AbstractArray`

storing the frequency distribution.

`edges`

An `AbstractRange`

specifying how the intervals for the frequency distribution are divided.

**Example**

Compute the threshold for the "cameraman" image in the `TestImages`

package.

```
using TestImages, ImageContrastAdjustment, HistogramThresholding
img = testimage("cameraman")
edges, counts = build_histogram(img, 256)
#=
The `counts` array stores at index 0 the frequencies that were below the
first bin edge. Since we are seeking a threshold over the interval
partitioned by `edges` we need to discard the first bin in `counts`
so that the dimensions of `edges` and `counts` match.
=#
t = find_threshold(counts[1:end], edges, Yen())
```

**Reference**

- Yen JC, Chang FJ, Chang S (1995), “A New Criterion for Automatic Multilevel Thresholding”, IEEE Trans. on Image Processing 4 (3): 370-378, doi:10.1109/83.366472

`HistogramThresholding.ThresholdAPI.find_threshold`

— Function```
find_threshold(data::AbstractArray, f::AbstractThresholdAlgorithm; nbins)
find_threshold(histogram::AbstractArray, edges::AbstractArray, f::AbstractThresholdAlgorithm)
```

Find a suitable threshold in `data`

using algorithm `f`

upon constructing a histogram with `nbins`

. Instead of specifing the raw `data`

, you can specify a histogram and accompanying edges directly.

**Output**

A real number representing a threshold that can be used to split data into two parts.

**Examples**

Just simply pass an algorithm to `find_threshold`

:

```
using TestImages, HistogramThreshold
img = testimage("cameraman")
t = find_threshold(img, f ; nbins = 64)
```

```
using StatsBase, HistogramThreshold
data = vcat(ones(50,), zeros(50,))
h = fit(Histogram, data)
t = find_threshold(data, f ; nbins = 2)
```

`HistogramThresholding.ThresholdAPI.build_histogram`

— Function```
edges, count = build_histogram(img) # For 8-bit images only
edges, count = build_histogram(img, nbins)
edges, count = build_histogram(img, nbins; minval, maxval)
edges, count = build_histogram(img, edges)
```

Generates a histogram for the image over `nbins`

spread between `[minval, maxval]`

. Color images are automatically converted to grayscale.

**Output**

Returns `edges`

which is a `AbstractRange`

type that specifies how the interval `[minval, maxval]`

is divided into bins, and an array `count`

which records the concomitant bin frequencies. In particular, `count`

has the following properties:

`count[0]`

is the number satisfying`x < edges[1]`

`count[i]`

is the number of values`x`

that satisfy`edges[i] <= x < edges[i+1]`

`count[end]`

is the number satisfying`x >= edges[end]`

.`length(count) == length(edges)+1`

.

**Details**

One can consider a histogram as a piecewise-constant model of a probability density function $f$ [1]. Suppose that $f$ has support on some interval $I = [a,b]$. Let $m$ be an integer and $a = a_1 < a_2 < \ldots < a_m < a_{m+1} = b$ a sequence of real numbers. Construct a sequence of intervals

\[I_1 = [a_1,a_2], I_2 = (a_2, a_3], \ldots, I_{m} = (a_m,a_{m+1}]\]

which partition $I$ into subsets $I_j$ $(j = 1, \ldots, m)$ on which $f$ is constant. These subsets satisfy $I_i \cap I_j = \emptyset, \forall i \neq j$, and are commonly referred to as *bins*. Together they encompass the entire range of data values such that $\sum_j |I_j | = | I |$. Each bin has width $w_j = |I_j| = a_{j+1} - a_j$ and height $h_j$ which is the constant probability density over the region of the bin. Integrating the constant probability density over the width of the bin $w_j$ yields a probability mass of $\pi_j = h_j w_j$ for the bin.

For a sample $x_1, x_2, \ldots, x_N$, let

\[n_j = \sum_{n = 1}^{N}\mathbf{1}_{(I_j)}(x_n), \quad \text{where} \quad \mathbf{1}_{(I_j)}(x) = \begin{cases} 1 & \text{if} \; x \in I_j,\\ 0 & \text{otherwise}, \end{cases},\]

represent the number of samples falling into the interval $I_j$. An estimate for the probability mass of the $j$th bin is given by the relative frequency $\hat{\pi} = \frac{n_j}{N}$, and the histogram estimator of the probability density function is defined as

\[\begin{aligned} \hat{f}_n(x) & = \sum_{j = 1}^{m}\frac{n_j}{Nw_j} \mathbf{1}_{(I_j)}(x) \\ & = \sum_{j = 1}^{m}\frac{\hat{\pi}_j}{w_j} \mathbf{1}_{(I_j)}(x) \\ & = \sum_{j = 1}^{m}\hat{h}_j \mathbf{1}_{(I_j)}(x). \end{aligned}\]

The function $\hat{f}_n(x)$ is a genuine density estimator because $\hat{f}_n(x) \ge 0$ and

\[\begin{aligned} \int_{-\infty}^{\infty}\hat{f}_n(x) \operatorname{d}x & = \sum_{j=1}^{m} \frac{n_j}{Nw_j} w_j \\ & = 1. \end{aligned}\]

**Options**

Various options for the parameters of this function are described in more detail below.

**Choices for nbins**

You can specify the number of discrete bins for the histogram. When specifying the number of bins consider the maximum number of graylevels that your image type supports. For example, with an image of type `N0f8`

there is a maximum of 256 possible graylevels. Hence, if you request more than 256 bins for that type of image you should expect to obtain zero counts for numerous bins.

**Choices for minval**

You have the option to specify the lower bound of the interval over which the histogram will be computed. If `minval`

is not specified then the minimum value present in the image is taken as the lower bound.

**Choices for maxval**

You have the option to specify the upper bound of the interval over which the histogram will be computed. If `maxval`

is not specified then the maximum value present in the image is taken as the upper bound.

**Choices for edges**

If you do not designate the number of bins, nor the lower or upper bound of the interval, then you have the option to directly stipulate how the intervals will be divided by specifying a `AbstractRange`

type.

**Example**

Compute the histogram of a grayscale image.

```
using TestImages, FileIO, ImageView
img = testimage("mandril_gray");
edges, counts = build_histogram(img, 256, minval = 0, maxval = 1)
```

Given a color image, compute the histogram of the red channel.

```
img = testimage("mandrill")
r = red.(img)
edges, counts = build_histogram(r, 256, minval = 0, maxval = 1)
```

**References**

[1] E. Herrholz, "Parsimonious Histograms," Ph.D. dissertation, Inst. of Math. and Comp. Sci., University of Greifswald, Greifswald, Germany, 2011.