Current high-throughput technologies, like Illumina WG-6 BeadChip, produce huge amount of gene expression profiles that should be eventually passed through several analyses until a scientific proposition could be stated. Among several standard protocols processing these data, the Joint-rank is an efficient utility to screen remarkable Fold Changes based on arbitrary selection criteria. The program ranks genes, jointly based on multiple experiment samples so that the most affected genes can be simply distinguished by top ranked Fold Changes.
jointrank [options] -FC <list> Options: -i or -infile <file> The input file of expression data. If your data are provided by several files, give a comma separated list of them. (default: input by keyboard) -delim "xyz..." Field delimiters. Each character of "xyz..." separates fields in input file. (default: tab) -o or -outfile <file> Output file. (default: write to display screen) A result summary is also reported by another output with the same file name and ".summary" extension. -label <a> The Fold Change columns in the output file are labeled by titles whose prefix is <a>. (default: -label F). -a or -arr <key> Arrange the output list based on key. The key can be one of or a combination of two letters 's' or 'f'. By 's', selected genes are alphabetically sorted based on gene symbols and by 'f', they are sorted based on their Fold Changes. Note that the order of these letters in a combined key is significant such that -a sf results a different arrangement than -a fs. (default: -a f). -fancy Put some fancy figures into output. -thrsh <threshold> Fold change threshold. (default: 1.0; threshold free fold changes) -top <n> Select n top fold changes. (default: 1000; a trivial top list!) -pred <k> The selection will be predicated on k observations. Here, k is an integer or "all" (default). <list> a comma separated list of column indexes and/or column headers on which the program evaluates fold changes and sorts affected genes based. The list entries are in the following format: TreatId/ControlId Each entry of the list simply represents a formula for a fold change. You should replace TreatId and ControlId with integers denoting indexes of two columns respectively corresponded to a treatment sample and a control sample. Conveniently, you can use column headers instead of column indexes. If your input file contains pre-calculated fold changes instead of absolute signal values, simply ignore "/ControlId" and make a list of single column ids. You don't have to use all columns of the input file for your assessment. You can direct several assessments based on partial subsets of treatments and controls using a single data file. The program properly behaves with generic formats of gene expression data. For instance, it is no matter whether a header row is distinguished in the input file or not.
Suppose we have performed Gene Expression Experiments on two tissue cultures using Illumina BeadChip technology or other similar technologies and a raw expression profile, like this has been obtained. Suppose that in these experiments, each primary sample has been cultured with four siRNAs of which two suppress the transcription factor, FOXC1 and other two suppress the transcription factor, PITX2, and that an additional negative control for each primary sample has been considered by scrambled siRNA treatments.
In the first example, we are looking for genes jointly affected by FOXC1-targeted siRNAs in samples of TM1. Genes of ≥ 2 fold change with respect to both controls are considered as affected.
$ jointrank -thrsh 2 -i sample-expression-profile.txt -FC TM1-FOXC1-siRNA1/TM1-Control1,\ TM1-FOXC1-siRNA1/TM1-Control2,TM1-FOXC1-siRNA2/TM1-Control1,TM1-FOXC1-siRNA2/TM1-Control2
By the second example, 40 genes of highest fold changes in TM2 samples treated with PITX2-targeted siRNAs are obtained;
$ jointrank -top 40 -i sample-expression-profile.txt -FC TM2-PITX2-siRNA1/TM2-Control1,\ TM2-PITX2-siRNA1/TM2-Control2,TM2-PITX2-siRNA2/TM2-Control1,TM2-PITX2-siRNA2/TM2-Control2
The same result must be obtained if we substitute columns id with columns indexes;
$ jointrank -top 40 -i sample-expression-profile.txt -FC 13/15,13/16,14/15,14/16
We can also make the program to produce result and summary files, separately by;
$ jointrank -top 40 -i sample-expression-profile.txt -FC 13/15,13/16,14/15,14/16 -o TM2-PITX2-top40
and to use self-explanatory titles for Fold Changes;
$ jointrank -top 40 -i sample-expression-profile.txt -FC 13/15,13/16,14/15,14/16 -label PITX2-FC -o TM2-PITX2-top40
A more sophisticated example of jointrank application is as follows;
$ jointrank -thrsh 2 -pred 4 -i glaucoma-sample.txt -FC 7/9,7/10,13/15,13/16,8/9,8/10,14/15,14/16\ -label PITX2-FC -o foureighth $ jointrank -thrsh 2 -pred 1 -i foureighth -FC PITX2-FC1,PITX2-FC2,PITX2-FC3,PITX2-FC4 -o PITX2-siRNA1 $ jointrank -thrsh 2 -pred 1 -i foureighth -FC PITX2-FC5,PITX2-FC6,PITX2-FC7,PITX2-FC8 -o PITX2-siRNA2 $ jointrank -i PITX2-siRNA1,PITX2-siRNA2 -FC 4,5,6,7,11,12,13,14 -o PITX2
in which the first command selects genes for which at least four of eight Fold Changes are ≥ 2, the second and the third command select genes for which at least one of four Fold Changes are ≥ 2, respectively in treatments with siRNA1 and siRNA2, and the final command determines common genes among these two results that are genes that show at least four Fold Changes ≥ 2 among eight and each siRNA appears at least once among corresponded comparisons.
For the last example that is also a serial application of jointrank, we determine affected genes based on "rank analysis";
$ jointrank -top 40 -i glaucoma-sample.txt -FC 7/9,7/10,13/15,13/16 -label PITX2-siRNA1-FC -o PITX2-top40A $ jointrank -top 40 -i glaucoma-sample.txt -FC 8/9,8/10,14/15,14/16 -label PITX2-siRNA2-FC -o PITX2-top40B $ jointrank -top 40 -i glaucoma-sample.txt -FC 7/9,7/10,13/15,13/16,8/9,8/10,14/15,14/16 -label PITX2-FC\ -o PITX2-top40C $ jointrank -i PITX2-top40A,PITX2-top40B,PITX2-top40C -FC 4,5,6,7,11,12,13,14,18,19,20,21,22,23,24,25\ -o PITX2-top40ABC
in which the two first commands find 40 genes of highest Fold Changes, respectively in treatments with siRNA1 and siRNA2, and by the third command, top 40 genes are selected based on joint ranking of all PITX2 Fold Changes. The final command finds common genes shared by all three groups.
The Joint-rank is written in C++. It should compile on any Unix-like system. To install the package, download the source code, unpack it, and then compile the source. This creates the jointrank.
$ tar xvzf jointrank.tar.gz $ cd jointrank $ g++ jointrank.cpp -o jointrank
[ Click here to start download ]
A sample input file is available by the following link;
[ Click here to start download ]
Ali Katanforoush, <a_katanforosh@sbu.ac.ir>, Department of Computer Science, Shahid Beheshti University, G.C., Tehran.
Time-stamp: "2010-10-09 15:31:03 katanforoush"