CSB.DB: Help

Help@CSB.DB: Coefficient - Transcript Co-Response Queries

If you want to get help directly related to a page/query, use the Info Pages / Medium Info Pages. Direct links are available at each (Query) Page.

If you are completely lost, here is link to a short description of what CSB.DB is and is not. Enter this page.

Coefficient

Co-response can be measured by various statistical coefficients. Currently, no public convention exists as to which numerical approach is best applied to detect and validate co-response of changing transcript levels. For this reason we integrated a range of different statistical and computational algorithms, which are routinely applied in various research areas.
To get information of your choice for a coefficient use the links listed below.

Correlation Coefficient: measure the magnitude and direction of the association between two variables.

Spearman | Kendall | Pearson |

Distance Coefficients & Other Coefficients:

Euclidean Distance | Mutual Information |

Correlation Coefficients:

Correlation Coefficient

Correlation coefficients measure the magnitude and direction of the association between two variables. The correlation between them reflects the degree to which both are related. Correlation coefficients range from +1 to -1. The closer the correlation is to either +1 or -1, the stronger the relationship. If the correlation near to zero, there is no association between the two variables.
The magnitude is the strength of the correlation (relationship). Usually it means the strength of the tendency of two variables, X and Y, to move in the same (or opposite) direction. The direction of the correlation estimate how the two variables are related. If the correlation is positive, the two variables have a positive (move in the same direction) or negative relationship (move in the opposite direction). A positive relationship means one variable increases and also the other one. In contrast a negative relationship reflects that the other variable decreases.

Spearman's r`s`

Spearman's rs is a non-parametric measure of correlation. The Spearman correlation is based on the ranked observation of the two variables, and therefore makes no assumption about the distribution of the values. It ranges from +1 to -1. Spearman's rs is the non-parametric 'variant' of Pearson correlation on ranked data and can detect linear and non-linear relationships (countinously increase). Spearman's rs is robust to outliers and does not required bivariate normal distributed observations.
Computation: Spearman's rs was computed as Pearson's Correlation on ranked expression measurements according Sokal & Rohlf (1995). Confidence Interval & Power of Test were determined as suggested by Bonett & Wright (2000).

Kendall's τ

Kendall's τ is a non-parametric measure of correlation, which is intended to measure the strength of relationship. In versus to other coefficients the algebraic structure of Kendall's τ is much simpler. It can even be computed from the actual observations without initial converting them to ranks but also from ranks. It ranges from +1 to -1. Kendall's τ can measure linear and non-linear relationships (countinously increase) and is robust to outliers. A bivariate normal distribution of observations for the variables is not required to compute Kendall's τ.
Computation: Kendall's τ was computed according Sokal & Rohlf (1995). Confidence Interval & Power of Test were determined as suggested by Bonett & Wright (2000).

Pearson's ρ (r)

The most common measure of correlation is the Pearson Product Moment Correlation (Pearson's ρ or Pearson's r). The Pearson correlation is a parametric measure of correlation and reflects the degree of linear relationship between two variables that are on an interval or ratio scale. It ranges from +1 to -1. The Pearson correlation can be affected by outliers, which may strongly increase or decrease the strength of relationship. The observations for both variables should be approximately (bivariate) normal distributed.
Computation: Pearson's ρ, Confidence Interval & Power of Test were computed according Sokal & Rohlf (1995).
To compute the p-value yourself you can transform Pearson's r into approximately t distributed values with N-2 degrees of freedom as follows:
t=r/sqrt[(1-r²)/(N-2)]
To compute a p-value in excel you can therefore use the following expression =TDIST(ABS(THE_CELL/SQRT((1-THE_CELL*THE_CELL)/(N-2))),(N-2),2)

Mutual Information & Distance Measure

Mutual Information (mi)

The mutual information (mi) is a general measure for statistical independence. The mutual information quantifies the reduction in the uncertainty of one random variable given knowledge about another random variable. The natural logarithm was used to compute the mutual information. The MI was adjusted for systemic bias and converted into distance range by d(MI) = -exp(MI). The d(MI) is in the range of 0 to 1. Zero represent strong dependency, 1 reflect that gene A and B are independent.
Computation of d(MI): The mi was computed of gene A and B based on the Shannon entropy (Shannon, 1948; overview: Steuer et al., 2002). Instead to use the log base 2 (Steuer et al., 2002, eq.(4)) we used the natural logarithm (ln). The obtained mutual information was adjusted for systemic bias (Steuer et al., 2002, eq.(20)) to obtain the true mi. Obtained negative values were treated as missing obersvations ('NA'). The true mi was than converted into distance range by use of d(MI) = -exp(MI). d(MI) is in the range of 0 to 1.

Euclidean Distance

The Euclidean distance is a commonly used measure of distance. The distance between two points is the length of the path connecting them. The Euclidean distance is just the sum of the squared distances of two vectors of observation, e.g. expression measurements for gene A and B. Normalization can be done by dividing the Euclidean distance by the maximum of all Euclidean distances. The normalized Euclidean distance is in the range of 0 to 1. 0 means the expression level of gene A and B are closely related, whereas 1 reflects most distant behaviour.
Computation of d(E): The Euclidean distance was computed according Mirkin (1996). To normalize the Euclidean distance each distance was divided by the maximum of distance obtained for the whole matrix: d(E) = dE / max(dE). The resulting normalized Euclidean distances are in range of 0 to 1.

Correlation Coefficient

Spearman's rs

Kendall's τ

Pearson's ρ (r)

Mutual Information (mi)

Euclidean Distance

Spearman's r`s`