If you want to get help directly related to a page/query, use the Info Pages / Medium Info Pages. Direct links are available at each (Query) Page.

If you are completely lost, here is link to a short description of what CSB.DB is and is not.

Enter this page.

### Correlation Coefficient

Correlation coefficients measure the magnitude and direction of the association between two variables. The correlation between them reflects the degree to which both are related. Correlation coefficients range from +1 to -1. The closer the correlation is to either +1 or -1, the stronger the relationship. If the correlation near to zero, there is no association between the two variables.

The magnitude is the strength of the correlation (relationship). Usually it means the strength of the tendency of two variables, X and Y, to move in the same (or opposite) direction. The direction of the correlation estimate how the two variables are related. If the correlation is positive, the two variables have a positive (move in the same direction) or negative relationship (move in the opposite direction). A positive relationship means one variable increases and also the other one. In contrast a negative relationship reflects that the other variable decreases.

### Spearman's r`s`

Spearman's r

`s` is a non-parametric measure of

correlation. The Spearman correlation is based on the ranked observation of the two variables, and therefore makes no assumption about the distribution of the values. It ranges from +1 to -1. Spearman's r

`s` is the non-parametric 'variant' of

Pearson correlation on ranked data and can detect linear and non-linear relationships (countinously increase). Spearman's r

`s` is robust to outliers and does not required bivariate normal distributed observations.

**Computation:** Spearman's r

`s` was computed as Pearson's Correlation on ranked expression measurements according

Sokal & Rohlf (1995). Confidence Interval & Power of Test were determined as suggested by

Bonett & Wright (2000).

### Kendall's τ

Kendall's τ is a non-parametric measure of

correlation, which is intended to measure the strength of relationship. In versus to other coefficients the algebraic structure of Kendall's τ is much simpler. It can even be computed from the actual observations without initial converting them to ranks but also from ranks. It ranges from +1 to -1. Kendall's τ can measure linear and non-linear relationships (countinously increase) and is robust to outliers. A bivariate normal distribution of observations for the variables is not required to compute Kendall's τ.

**Computation:** Kendall's τ was computed according

Sokal & Rohlf (1995). Confidence Interval & Power of Test were determined as suggested by

Bonett & Wright (2000).

### Pearson's ρ (r)

The most common measure of

correlation is the Pearson Product Moment Correlation (Pearson's ρ or Pearson's r). The Pearson correlation is a parametric measure of correlation and reflects the degree of linear relationship between two variables that are on an interval or ratio scale. It ranges from +1 to -1. The Pearson correlation can be affected by outliers, which may strongly increase or decrease the strength of relationship. The observations for both variables should be approximately (bivariate) normal distributed.

**Computation:** Pearson's ρ, Confidence Interval & Power of Test were computed according

Sokal & Rohlf (1995).

To compute the p-value yourself you can
transform Pearson's r into approximately t distributed values with N-2 degrees of freedom as
follows:

t=r/sqrt[(1-rē)/(N-2)]

To compute a p-value in
excel you can therefore use the following expression
=TDIST(ABS(THE_CELL/SQRT((1-THE_CELL*THE_CELL)/(N-2))),(N-2),2)

### Mutual Information (mi)

The mutual information (mi) is a general measure for statistical independence. The mutual information quantifies the reduction in the uncertainty of one random variable given knowledge about another random variable. The natural logarithm was used to compute the mutual information. The MI was adjusted for systemic bias and converted into distance range by d(MI) = -exp(MI). The d(MI) is in the range of 0 to 1. Zero represent strong dependency, 1 reflect that gene A and B are independent.

**Computation of d(MI):** The mi was computed of gene A and B based on the Shannon entropy (

Shannon, 1948; overview:

Steuer *et al.*, 2002). Instead to use the log base 2 (

Steuer *et al.*, 2002, eq.(4)) we used the natural logarithm (ln). The obtained mutual information was adjusted for systemic bias (

Steuer *et al.*, 2002, eq.(20)) to obtain the true mi. Obtained negative values were treated as missing obersvations ('NA'). The true mi was than converted into distance range by use of d(MI) = -exp(MI). d(MI) is in the range of 0 to 1.

### Euclidean Distance

The Euclidean distance is a commonly used measure of distance. The distance between two points is the length of the path connecting them. The Euclidean distance is just the sum of the squared distances of two vectors of observation, e.g. expression measurements for gene A and B. Normalization can be done by dividing the Euclidean distance by the maximum of all Euclidean distances. The normalized Euclidean distance is in the range of 0 to 1. 0 means the expression level of gene A and B are closely related, whereas 1 reflects most distant behaviour.

**Computation of d(E):** The Euclidean distance was computed according

Mirkin (1996). To normalize the Euclidean distance each distance was divided by the maximum of distance obtained for the whole matrix: d(E) = d

`E` / max(d

`E`). The resulting normalized Euclidean distances are in range of 0 to 1.