CSB.DB: Statistics & Mathematics

Statistics@CSB.DB

To gain insight in understanding life's complexity pyramid and higher interaction of genes, metabolites, and proteins, various statistical and other tools, ideas and concepts are required for a more holistic view. Therefore CSB.DB tries to integrate these concepts from various fields of research. This page gives you a first overview about our activities to integrate the concepts mentioned above as well as to give feedback to the scientific community.

To get information of your choice use the links listed below.

Profile Processing: A short description of profile processing strategy.
Information Retrieval: A first insight into accessing the results of statistical analyses.

Profile Processing

CSB.DB: Profile Processing

Analysis of sequence similarity is routinely applied to functional annotation of unknown genes. Results require and are restricted to fully characterised genes or protein domains. Multi-parallel monitoring of mRNA levels today allows the extended prediction of gene function beyond sets of homologous genes (Wu et al., 2002). The high demand for multi-parallel analyses on various levels of completely or of large genome sections of various organisms, and the complexity of information obtained inspired both computational and statistical science. The transcriptional co-response analysis is one of the often used and further developed techniques which described the co-expressed behaviour of transcripts. CSB.DB used publicly available expression profiles of various organisms, as rich resources for such cross-experiment investigations. The underlying transcript profiles were quality checked and well measured genes were used to generate multi-conditional expression matrices. The resulting filtered expression matrices contain only 5% missing expression data per gene. Correlations were computed with the program cCoR (Steinhauser et al., unpublished) and covers a variety of parametric and non-parametric pair wise measures, such as the parametric Pearson (r) linear product-moment correlation, the non-parametric Spearman correlation (rs) and the non-parametric Kendall's coefficient of rank correlation (τ). These measurements are computed and stored in organism specific sub databases (

) and can be publicly accessed and queried via a web interface. The user can easily query the databases and is assisted by CSB.DB information to help with exploration.

Information Retrieval

CSB.DB: Information Retrieval

Gene co-responses (correlations) can be obtained via various query tools (

), which allow the user to pre-define genes or set of genes of interest and returns a HTML tables (

) with all genes associated by transcriptional co-response, which meet pre-set thresholds. The statistic parameters, which mean the probability, the confidence interval and the power of the applied test, are dynamically calculated based on the underlying test distribution of the respective correlation coefficient. Moreover a detailed statistical analysis can be obtained for a selected gene pair by user validation of output (

). Beyond it, the validation allows the detection of outliers, which may be associated with technical errors, biological variance or can represent a biological regulatory event, as well as a variety of graphical plots.