Joint-rank

About

Current high-throughput technologies, like Illumina WG-6 BeadChip, produce huge amount of gene expression profiles that should be eventually passed through several analyses until a scientific proposition could be stated. Among several standard protocols processing these data, the Joint-rank is an efficient utility to screen remarkable Fold Changes based on arbitrary selection criteria. The program ranks genes, jointly based on multiple experiment samples so that the most affected genes can be simply distinguished by top ranked Fold Changes.

Usage


  jointrank [options] -FC <list>


    Options:
        -i or -infile <file>     The input file of expression data.
                                 If your data are provided by several files,
                                 give a comma separated list of them.
                                 (default: input by keyboard)
        -delim "xyz..."          Field delimiters. Each character of "xyz..."
                                 separates fields in input file. (default: tab)
        -o or -outfile <file>    Output file. (default: write to display screen)
                                 A result summary is also reported by another
                                 output with the same file name and ".summary"
                                 extension.
        -label <a>               The Fold Change columns in the output file are
                                 labeled by titles whose prefix is <a>.
                                 (default: -label F).
        -a or -arr <key>         Arrange the output list based on key.
                                 The key can be one of or a combination of two
                                 letters 's' or 'f'. By 's', selected genes
                                 are alphabetically sorted based on gene
                                 symbols and by 'f', they are sorted based on
                                 their Fold Changes.
                                 Note that the order of these letters in a
                                 combined key is significant such that -a sf
                                 results a different arrangement than -a fs.
                                 (default: -a f).
        -fancy                   Put some fancy figures into output.

        -thrsh <threshold>       Fold change threshold.
                                 (default: 1.0; threshold free fold changes)
        -top <n>                 Select n top fold changes.
                                 (default: 1000; a trivial top list!)

        -pred <k>                The selection will be predicated 
                                 on k observations.
                                 Here, k is an integer or "all" (default).

    <list> a comma separated list of column indexes and/or column headers on
         which the program evaluates fold changes and sorts affected genes
         based.
         The list entries are in the following format:

           TreatId/ControlId     

         Each entry of the list simply represents a formula for a fold change.
         You should replace TreatId and ControlId with integers denoting
         indexes of two columns respectively corresponded to a treatment sample
         and a control sample.
         Conveniently, you can use column headers instead of column indexes.

  If your input file contains pre-calculated fold changes instead of absolute
  signal values, simply ignore "/ControlId" and make a list of single column ids.

  You don't have to use all columns of the input file for your assessment. You
  can direct several assessments based on partial subsets of treatments and
  controls using a single data file.

  The program properly behaves with generic formats of gene expression data.
  For instance, it is no matter whether a header row is distinguished in the
  input file or not.

Example

Suppose we have performed Gene Expression Experiments on two tissue cultures using Illumina BeadChip technology or other similar technologies and a raw expression profile, like this has been obtained. Suppose that in these experiments, each primary sample has been cultured with four siRNAs of which two suppress the transcription factor, FOXC1 and other two suppress the transcription factor, PITX2, and that an additional negative control for each primary sample has been considered by scrambled siRNA treatments.

In the first example, we are looking for genes jointly affected by FOXC1-targeted siRNAs in samples of TM1. Genes of ≥ 2 fold change with respect to both controls are considered as affected.

 $  jointrank -thrsh 2 -i sample-expression-profile.txt -FC TM1-FOXC1-siRNA1/TM1-Control1,\
TM1-FOXC1-siRNA1/TM1-Control2,TM1-FOXC1-siRNA2/TM1-Control1,TM1-FOXC1-siRNA2/TM1-Control2

By the second example, 40 genes of highest fold changes in TM2 samples treated with PITX2-targeted siRNAs are obtained;

 $  jointrank -top 40 -i sample-expression-profile.txt -FC TM2-PITX2-siRNA1/TM2-Control1,\
TM2-PITX2-siRNA1/TM2-Control2,TM2-PITX2-siRNA2/TM2-Control1,TM2-PITX2-siRNA2/TM2-Control2

The same result must be obtained if we substitute columns id with columns indexes;

 $  jointrank -top 40 -i sample-expression-profile.txt -FC 13/15,13/16,14/15,14/16

We can also make the program to produce result and summary files, separately by;

 $  jointrank -top 40 -i sample-expression-profile.txt -FC 13/15,13/16,14/15,14/16 -o TM2-PITX2-top40

and to use self-explanatory titles for Fold Changes;

 $  jointrank -top 40 -i sample-expression-profile.txt -FC 13/15,13/16,14/15,14/16 -label PITX2-FC -o TM2-PITX2-top40


A more sophisticated example of jointrank application is as follows;

 $  jointrank -thrsh 2 -pred 4 -i glaucoma-sample.txt -FC 7/9,7/10,13/15,13/16,8/9,8/10,14/15,14/16\
 -label PITX2-FC -o foureighth
 $  jointrank -thrsh 2 -pred 1 -i foureighth -FC PITX2-FC1,PITX2-FC2,PITX2-FC3,PITX2-FC4 -o PITX2-siRNA1
 $  jointrank -thrsh 2 -pred 1 -i foureighth -FC PITX2-FC5,PITX2-FC6,PITX2-FC7,PITX2-FC8 -o PITX2-siRNA2
 $  jointrank -i PITX2-siRNA1,PITX2-siRNA2 -FC 4,5,6,7,11,12,13,14 -o PITX2

in which the first command selects genes for which at least four of eight Fold Changes are ≥ 2, the second and the third command select genes for which at least one of four Fold Changes are ≥ 2, respectively in treatments with siRNA1 and siRNA2, and the final command determines common genes among these two results that are genes that show at least four Fold Changes ≥ 2 among eight and each siRNA appears at least once among corresponded comparisons.


For the last example that is also a serial application of jointrank, we determine affected genes based on "rank analysis";

 $  jointrank -top 40 -i glaucoma-sample.txt -FC 7/9,7/10,13/15,13/16 -label PITX2-siRNA1-FC -o PITX2-top40A
 $  jointrank -top 40 -i glaucoma-sample.txt -FC 8/9,8/10,14/15,14/16 -label PITX2-siRNA2-FC -o PITX2-top40B
 $  jointrank -top 40 -i glaucoma-sample.txt -FC 7/9,7/10,13/15,13/16,8/9,8/10,14/15,14/16 -label PITX2-FC\
 -o PITX2-top40C
 $  jointrank -i PITX2-top40A,PITX2-top40B,PITX2-top40C -FC 4,5,6,7,11,12,13,14,18,19,20,21,22,23,24,25\
 -o PITX2-top40ABC

in which the two first commands find 40 genes of highest Fold Changes, respectively in treatments with siRNA1 and siRNA2, and by the third command, top 40 genes are selected based on joint ranking of all PITX2 Fold Changes. The final command finds common genes shared by all three groups.

Installation

The Joint-rank is written in C++. It should compile on any Unix-like system. To install the package, download the source code, unpack it, and then compile the source. This creates the jointrank.

 $  tar xvzf jointrank.tar.gz
 $  cd jointrank
 $  g++ jointrank.cpp -o jointrank 

[ Click here to start download ]

Sample input

A sample input file is available by the following link;

[ Click here to start download ]

Contact

Ali Katanforoush, <a_katanforosh@sbu.ac.ir>, Department of Computer Science, Shahid Beheshti University, G.C., Tehran.

Time-stamp: "2010-10-09 15:31:03 katanforoush"