Skip to main content

Distance-based test for uncertainty hypothesis testing

Abstract

Background

If an appropriate probability distribution cannot be identified for a given situation, it becomes extremely difficult to draw reliable inferences about the given domain of study under investigation. This is due to the fact that statistical theory of testing of hypothesis cannot be meaningfully employed in those cases. To deal with such situations, Uncertainty theory is recommended as an alternative by Liu (2007) and testing the validity of the hypotheses about uncertainty distributions is currently receiving the attention of researchers.

Methods

In this paper, for testing uncertain hypotheses about the true uncertainty distribution function, a new test procedure based on the inputs given by one or more domain experts is suggested. The proposed method can also be used for testing uncertain hypotheses about the equality of two uncertainty distribution functions.

Results

Illustrative examples are also provided in support of the test procedure suggested in this paper to demonstrate the utility of the same.

Conclusions

The same methodology can be used for testing the equality of two uncertainty distributions by making use of the ratio used in the construction of the test.

Background

Testing of statistical hypotheses is a major branch of study in classical statistical inference. It deals with the process of developing appropriate test procedures for testing the validity of statistical hypotheses. Statistical hypotheses are statements about characteristics of real-life situations modeled in terms of probability distributions and a statistical test helps the decision maker whether to accept or reject the given hypothesis based on sampled observations. The theory of testing of statistical hypotheses revolves around the probability theory.

There are several real-life situations where it would be very difficult to identify appropriate models (probability distributions) describing the probabilistic properties of the given phenomena. Further, collection of adequate information in the form of sampled data to explain fully the probability distribution is not always viable. To deal with these situations, [1] introduced a new theory called the Theory of Uncertainty. Further refinements on the Theory of Uncertainty have been carried out by Liu [2]. For more details about the Theory of Uncertainty and its applications in various fields of research, one can refer to [2]. The online resource of [3] is an excellent source of information on the latest status of various aspects related to Uncertainty Theory.

It is well known that probability distributions are the backbone of the theory of statistical inference that helps practitioners to study about the inherent characteristics of the given situation. Inferences related to the given system require the knowledge of parameters involved in the underlying probability distributions for which several solutions are available in the literature. Similar to probability distributions playing a crucial role in the stochastic situations, uncertainty distributions play a significant role in the Theory of Uncertainty. Uncertainty distributions model the nature of uncertainty present in the given system. Several uncertainty distributions and their properties are available in [3]. These distributions have certain unknown constants, and practitioners require the knowledge of these quantities to study the nature of uncertainty. Liu [2] suggested an estimation procedure for the estimation of parameters in an uncertainty distribution. It was followed by the works of Wang and Peng [4] and Wang et al. [5]. Recently, Wang et al. [6] introduced an uncertain hypothesis testing procedure to test the equality of two uncertainty distributions.

In this paper, a new test procedure is introduced for testing whether a specified uncertainty distribution function can be the true uncertainty distribution function of the given system. The proposed test procedure makes use of a distance based on empirical comprehensive uncertainty distribution defined by Liu [2]. The suggested procedure can be modified suitably for handling the situation wherein one will be interested in testing the equality of two uncertainty distributions. The paper is organized as follows. The second section of the paper introduces the uncertainty theory and uncertainty distributions briefly. The third section of the paper explains Wang et al. [6] test procedure and introduces the new test for testing hypotheses about uncertainty distributions. Illustrations are given in the fourth section, and conclusions are provided in fifth section.

Methods

Uncertainty distributions

Let Г be a nonempty set and L be a σ - algebra over Г. Elements of L are known as events. Uncertainty measure M is a function from L to [0,1] which measures the degree of belief associated with an event. Initially, it was introduced as a function from L to [0,1], satisfying the axioms such as normality, monotonicity, self-duality, and countable subadditivity [1]. Later on, Liu [3] refined the definition of uncertainty measure and defined it as a measure satisfying normality, duality, and subadditivity axioms. A measureable function ξ from the uncertainty space (Г,L,M) to the set of real numbers is defined as uncertain variable. The uncertainty distribution Ф:R → [0,1] of an uncertain variable ξ is defined by Ф(x) = M{ξ ≤ x}, for any x ∈ R. According to Peng and Iwamura [7], a sufficient and necessary condition for a function Ф:R → [0,1] to be an uncertainty distribution function is that the function is an increasing function except for the choices Ф(x) ≡ 0 and Ф(x) ≡ 1.

Example of some uncertainty distribution functions [3] are

  1. 1.

    The uncertainty normal distribution denoted by N(c,σ),c ∈ R, σ > 0 is defined as

    Φ x = 1 + e π c − x 3 σ − 1 , x ∈ R .
    (1)
  2. 2.

    The uncertainty lognormal distribution denoted by LOGN(c,σ), ∈ c R, σ > 0 is defined as

    Φ x = 1 + e π c − log x 3 σ − 1 , x ≥ 0.
    (2)

Liu [2] has given a method of computing empirical uncertainty distribution function using the data collected from an expert. Assume that the set of expert’s experimental data (x 1,α 1),(x 2,α 2),…,(x n ,α n )meets the consistent condition x 1 < x 2 < … < x n , 0 ≤ α 1 ≤ α 2 ≤ … ≤ α n ≤1.

Then, the empirical uncertainty distribution is computed by

Φ ^ x = 0 if x < x 1 α i + α i + 1 − α i x − x i x i + 1 − x i if x i ≤ x ≤ x i + 1 , 1 ≤ i < n 1 if x > x n .
(3)

To distinguish the empirical uncertainty distribution function from the true uncertainty distribution Ф, we use the symbol ^ on top of Ф for empirical uncertainty distribution function. When the experimental data is collected from m experts, empirical comprehensive uncertainty distribution function is obtained using the convex combination of the empirical uncertainty distribution computed for each expert. That is, we compute the empirical uncertainty distribution functions Φ ^ 1 , Φ ^ 2 ,…, Φ ^ m using the above definition of empirical uncertainty distribution function for the data collected from the m experts. Then, these m empirical uncertainty distribution functions are combined in the following manner to get the empirical comprehensive uncertainty distribution function

Φ ^ x = w 1 Φ ^ 1 x + w 2 Φ ^ 2 x + … + w m Φ ^ m x ,
(4)

where ∑ i = 1 m w i = 1 , w i ≥ 0 , i = 1 , 2 , … , m .

It is pertinent to note that this convex combination is also an empirical uncertainty distribution as proved by Peng and Iwamura [7].

Test for uncertainty distribution hypotheses

In this section, we develop a test procedure for testing hypotheses about uncertainty distributions. An uncertain hypothesis is a hypothesis about uncertainty distributions that characterize uncertain situations.

Wang et al. [4] suggested a method of uncertain hypotheses testing based on uncertainty theory to test whether two uncertainty distributions are equal or not. They considered the two sets of expert’s data of the form x 1 1 , α 1 1 , x 2 1 , α 2 1 , … , x m 1 , α m 1 and x 1 2 , α 1 2 , x 2 2 , α 2 2 , … , x n 2 , α n 2 that meet the consistent conditions

x 1 1 < x 2 1 < … < x m 1 , 0 ≤ α 1 1 ≤ α 2 1 ≤ … ≤ α m 1 ≤ 1
x 1 2 < x 2 2 < … < x n 2 , 0 ≤ α 1 2 ≤ α 2 2 ≤ … ≤ α n 2 ≤ 1.

It is presumed that the two theoretical uncertainty distributions with respect to the expert’s data are F 1(x) and F 2(x). To test the null uncertainty hypothesis H 0:F 1(x) = F 2(x) for any x ∈ R against the alternative uncertainty hypothesis H 1:F 1(x) ≠ F 2(x) for some x ∈ R, Wang et al. [4] constructed a test procedure based on randomly generated points from two empirical uncertainty distribution functions corresponding to the two experts’ data In this paper, we construct a test for testing the uncertain hypothesis that a given function F 1 can be the true uncertainty distribution function for the given situation of interest against the alternative hypothesis that F 2 is the true uncertainty distribution function. Consider the problem of testing the hypothesis H 0:F(x) = F 1(x) ∀x against the alternative hypothesis H 1:F(x) = F 2(x) ∀x, where F 1 and F 2 are known theoretical uncertainty distribution functions.

The test procedure assumes that the data related to the given testing problem are collected from m experts. Each expert gives his opinion in terms of a sequence of numbers along with the corresponding belief levels, where the lengths of the sequences are not necessarily the same for all experts but the numbers in every sequence are expected to cover possible values of the subject of interest. The data collected from the m experts can be mathematically described as given in Table 1. It is pertinent to note as in the case of formulating hypotheses in the classical theory of testing statistical hypotheses, the functions specified under null and alternative hypothesis are determined judiciously in an appropriate manner by taking into account the chance environment and experts’ opinion. Issues related to this aspect have been discussed in the illustrative examples.

Table 1 Experimental data for m experts

Corresponding to the information given in the m rows of Table 1, we compute the empirical uncertainty distribution functions Φ ^ 1 , Φ ^ 2 … Φ ^ m using the definition given in the previous section and combine these m empirical functions, using a convex combination, to get the empirical comprehensive uncertainty distribution function defined in Eq. (4).

Let A r be the sequence of numbers reported by the expert r(r = 1,2,…,m) and S = ∪ r = 1 m A r . For every x ∈ A r , we compute the distance between the empirical comprehensive uncertainty distribution and the empirical uncertainty distribution function given under the null hypothesis, namely,

d Φ ^ , F 1 = ∑ x ∈ S d x Φ ^ x , F 1 x ,

where d x Φ ^ x , F 1 x = Φ ^ x − F 1 x 2 ∀ x ∈ S .

Similarly, we find the distance between the empirical comprehensive uncertainty distribution and the uncertainty distribution function given under the alternative hypothesis using

d Φ ^ , F 2 = ∑ x ∈ S d x Φ ^ x , F 2 x ,

where d x Φ ^ x , F 2 x = Φ ^ x − F 2 x 2 ∀ x ∈ S .

If we treat the empirical comprehensive uncertainty distribution function based on the opinion of several domain experts as an appropriate estimate of the true distribution, then it is reasonable to develop a test based on the same. If the distribution mentioned under the null hypothesis is to be supported, then we expect the distance between the empirical comprehensive uncertainty distribution function Φ ^ and F 1 to be small. Hence, it is reasonable to define the rejection rule for rejecting the null hypothesis as

d Φ ^ , F 1 d Φ ^ , F 2 > k ,
(5)

where k is the largest real number satisfying the inequality

σ(R k ) ≥ α.

Here, R k = x d x Φ ^ x − F 1 x d x Φ ^ x − F 2 x > k and σ R k = N R k N S , N being the number of elements in the given set. The constant α is predetermined by the user. It is nothing but the proportion of items in S = ∪ r = 1 m A r for which the ratio of the distance between the empirical comprehensive uncertainty distribution function and the distribution specified under null hypothesis to the corresponding distance based on the distribution mentioned under the alternative hypothesis, exceeding the threshold value k. It may be noted that the value of α closer to 1 will increase the number of cases where the condition mentioned in R k will be satisfied leading higher chance of rejection. Similarly, a value of α closer to 0 will decrease the number of cases where the condition mentioned in R k will be satisfied leading to lower chance of rejection. The practitioner has to decide the choice of α in a judicious manner striking a balance between the rejection rate and the chance of taking a correct decision.

Results and discussion

To illustrate the process of developing a test procedure using the above method, two examples are considered.

Example 1

The data given in [6] are based on knowledge and experience of three teachers who performed an analysis about the degree of difficulty of a higher mathematics examination. The experimental data describing their estimated average scores and belief degrees are given below.

  • Teacher 1: (60, 0.05), (70, 0.15), (80, 0.55), (85, 0.85), (90, 0.95)

  • Teacher 2: (60, 0.08), (70, 0.17), (75, 0.36), (80, 0.58), (85, 0.85), (90, 0.95)

  • Teacher 3: (50, 0.2), (60, 0.3), (70, 0.4), (80, 0.8), (85, 1)

As mentioned earlier, the most important task in testing of uncertain hypotheses is the formulation of null and alternative hypotheses in a meaningful manner. It has two stages, namely, identifying a suitable uncertainty distribution (e.g., zigzag, normal, and lognormal distributions) for the given situation and the parametric values to be used under the null and alternative hypotheses. This can be accomplished using the works of Liu [3] and Wang and Peng [4] related to estimation of uncertainty distributions.

Using the Matlab Uncertainty Toolbox (http://orsc.edu.cn/liu/resources.htm), normal and lognormal uncertainty distributions were fitted for three data sets using the least squares method of Liu [3]. Since the errors corresponding to lognormal are small (for all the three data sets) when compared to normal uncertainty distributions, we have decided to use the lognormal distribution under both null and alternative hypotheses. Figures 1, 2, and 3 give shapes of the fitted lognormal uncertainty distribution functions as well as the pairs of experimental data corresponding to the true teachers. For the three data sets, the least squares estimated values corresponding to lognormal fit for teacher 1, teacher 2, and teacher 3 are (C = 4.3605, σ = 0.1010), (C = 4.3520, σ = 0.1095), and (C = 4.2227, σ = 0.2271), respectively. As in the case of classical statistical theory of testing of hypotheses, it is left to the discretion of the practitioner to decide the choices to be used under null and alternative hypotheses based on his belief and observation of the given system. The distributions to be used under the null and alternative uncertainty hypotheses have been chosen by taking into account the distances between the fitted distributions and the empirical comprehensive uncertainty distribution as defined in the third section . One of the distributions having the two smallest distances has been used in the null hypothesis and the other in the alternative hypothesis. Thus, we consider testing the null uncertainty hypothesis H 0 : Φ x = 1 + e Ï€ 4.3605 − ln x 3 0.1010 − 1 against the alternative uncertainty hypothesis

H 1 : Φ x = 1 + e π 4.3520 − ln x 3 0.1095 − 1
Figure 1
figure 1

Fitted lognormal for Wang et al. [6]  data of teacher 1 with C = 4.3605 and σ = 0.1010.

Figure 2
figure 2

Fitted lognormal for Wang et al. [6]  data of teacher 2 with C = 4.3546 and σ = 0.1109.

Figure 3
figure 3

Fitted lognormal for Wang et al. [6]  data of teacher 3 with C = 4.2227 and σ = 0.2271.

We start the process of formulating the test procedure by computing the empirical comprehensive uncertainty distribution using the experimental data obtained from the three different teachers, where empirical comprehensive uncertainty distribution based on three experts’ opinion are given by

Φ ^ x = w 1 Φ ^ 1 x + w 2 Φ ^ 2 x + w 3 Φ ^ 3 x

Here, Φ ^ 1 , Φ ^ 2 , and Φ ^ 3 are empirical distributions based on first, second, and third teachers. It may be noted that the weightsw 1, w 2, and w 3 are non-negative quantities satisfying, w 1 + w 2 + w 3 = 1. For the sake of simplicity, we assume w 1, w 2, and w 3 are 1 3 . For the given data, we have A 1 = {60,70,80,85,90},A 2 = {60,70,75,80,85,90}, and A 3 = {50,60,70,80,85} S = {50,60,70,75,80,85,90}.

Therefore, we get

d Φ ^ , F 1 = ∑ x ∈ S d x ( Φ ^ x , F 1 x ) = 0.06359114
d Φ ^ , F 2 = ∑ x ∈ S d x Φ ^ x , F 2 x = 0.04387163 .

Hence, d Φ ^ , F 1 d Φ ^ , F 2 = 1.449482 .

Table 2 gives σ(R k ) for different values of k. If the user decides to choose α = 0.4, then k is taken as 2.5. This choice leads to the acceptance of the hypothesis H 0 : Φ x = 1 + e Ï€ 4.3605 − ln x 3 0.1010 − 1 .

Table 2 Range of k and σ ( R k ) for Wang et al. [6] data

Example 2

In Example 1, since lognormal uncertainty distribution fitted well for all the three data sets, we have used the same distribution under the null and alternative hypotheses. Now, we consider a testing problem where different distributions are used under the null and alternative hypotheses.

This example is based on the data provided in Chapter 4 of [3]. The expert’s experimental data is given below:

(0.6, 0.1), (1.0, 0.3), (1.5, 0.4), (2.0, 0.6), (2.8, 0.8), (3.6, 0.9).

Using the Matlab Uncertainty Toolbox (http://orsc.edu.cn/liu/resources.htm), lognormal uncertainty distribution and normal uncertainty distribution were fitted for the above data set using the least squares method. Figures 4 and 5 explain the observed data points as well as the fitted distributions for the expert’s experimental data set.

Figure 4
figure 4

Fitted lognormal for Liu [3] data with C = 0.4825 and σ = 0.7852.

Figure 5
figure 5

Fitted lognormal for Liu [3] data with C = 1.7690 and σ = 1.2953.

The least squares estimated values corresponding to lognormal and normal fit for data set are (C = 0.4825, σ = 0.7852) and (C = 1.7690, σ = 1.2953), respectively. The errors corresponding to the lognormal and normal fit are 0.0081 and 0.0074. It is decided to test the null uncertain hypothesis H 0 : Φ x = 1 + e π 0.4825 − ln x 3 0.7852 − 1 against alternative uncertain hypothesis H 0 : Φ x = 1 + e π 0.4825 − ln x 3 0.7852 − 1 .

It is to be noted that a testing problem of this kind becomes meaningful in these types of situations since the errors do not show a huge difference. If the difference between the errors is considerably large, then one can use the distribution function corresponding to the smaller error as the one suitable for the given uncertain situation without depending on any test procedure.

Since the data is based on only one expert, the test makes use of the empirical uncertainty distribution using the experimental data obtained from the expert.

Here, A = S = {0.6,1,1.5,2,2.8},

d Φ ^ , F 1 = ∑ x ∈ S d x ( Φ ^ x , F 1 x ) = 0.0073954
d Φ ^ , F 2 = ∑ x ∈ S d x Φ ^ x , F 2 x = 0.0080926 .

Therefore, d Φ ^ , F 1 d Φ ^ , F 2 = 0.913856 .

Table 3 gives σ(R k ) for different values of k. If the user decides to choose α = 0.4, then k must be taken as 0.7. This choice leads to the rejection of the hypothesis H 0 : Φ x = 1 + e Ï€ 0.4825 − ln x 3 0.7852 − 1 .

Table 3 Range of k and σ ( R k ) for Liu [3] data

Conclusions

In this paper, a new test procedure that makes use of the data gathered from one or more domain experts has been developed for testing whether a specified uncertainty distribution can be the true uncertainty distribution function of the given situation. Two illustrative examples are also provided by making use of the data sets available in [4] and [3]. The first example deals with the case where both the null and alternative hypotheses use the lognormal uncertainty distribution, whereas the second example considers the testing problem where lognormal uncertainty and normal uncertainty distributions are used under null and alternative hypotheses, respectively.

It is pertinent to note that the same methodology can be used for testing the equality of two uncertainty distributions by making use of the ratio used in the construction of the test explained in the third section. Decision regarding the acceptance or rejection of the null hypotheses can be made by making use of the same ratio, namely, d Φ ^ , F 1 d Φ ^ , F 2 . However, the null hypothesis will be rejected if the ratio d Φ ^ , F 1 d Φ ^ , F 2 is either very small or very large.

Authors’ information

First author SS is currently holding the position of Professor of Statistics in the University of Madras, Chennai, India. He has more than 30 years experience in teaching and research. His main areas of research are Inference for Finite Populations, Classical Statistical Inference, and Data Mining. He has to his credit more than 30 research articles in highly rated research journals. Second author BR is a full-time research scholar working for her Ph.D. degree in the Department of Statistics, University of Madras, Chennai, India. Her research interest includes Fuzzy and Rough Set theories.

References

  1. Liu B: Uncertainty theory. 2nd edition. Berlin: Springer-Verlag; 2007.

    Google Scholar 

  2. Liu B: A branch of mathematics for modeling human uncertainty. Berlin: Springer-Verlag; 2011.

    Google Scholar 

  3. Liu B: Uncertainty theory. 4th edition. 2013. . Accessed January 2013 http://orsc.edu.cn/liu/ut.pdf

    Google Scholar 

  4. Wang X, Peng Z: Method of moments for estimation uncertainty distribution. 2012. . Accessed January 2013 http://orsc.edu.cn/online/100408.pdf

    Google Scholar 

  5. Wang X, Gao Z, Guo H: Delphi method for estimating uncertainty distributions. Information: An International Interdisciplinary Journal 2012,12(2):449–460.

    MathSciNet  Google Scholar 

  6. Wang X, Gao Z, Guo H: Uncertain hypothesis testing for expert’s empirical data. Math. Comput. Model. 2012, 55: 1478–1482. 10.1016/j.mcm.2011.10.039

    Article  MathSciNet  Google Scholar 

  7. Peng Z, Iwamura K: A sufficient and necessary condition of uncertainty distribution. J. Interdiscipl. Math. 2010, 13: 277–285. 10.1080/09720502.2010.10700701

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors wish to thank the referees for their comments and suggestions that lead to considerable amount of improvement in the contents as well as the overall organization of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sundaram Sampath.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contribution

SS has initiated the research work, devised the testing criteria introduced in this paper, and guided RB in working out the illustrative examples. RB has prepared the initial draft of the manuscript and contributed in numerical work. SS has suggested the sequence alignment and helped in preparing the final version of the manuscript. Both authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Sampath, S., Ramya, B. Distance-based test for uncertainty hypothesis testing. J. Uncertain. Anal. Appl. 1, 4 (2013). https://doi.org/10.1186/2195-5468-1-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/2195-5468-1-4

Keywords