Publications

Journal Papers
2011
Haplotype block partitioning and tagSNP selection under the perfect phylogeny model.
Eslahchi, Ch., Katanforoush, A., Pezeshk, H., Afzaly, N.
(2011) Iranian Journal of Biotechnology, 9 (4), pp. 281-289.
[Abstract]
Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome. Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Particular pattern of these common variations forms a block-like structure on human genome. In this work, we develop a new method based on the Perfect Phylogeny Model to identify haplotype blocks using samples of individual genomes. We give a rigorous definition of the quality of the partitioning of haplotypes into blocks and describe a greedy algorithm for finding the proper partitioning in case of perfect and semi-perfect phylogeny. It is shown that the minimum number of tagSNPs in a haplotype block of Perfect Phylogeny can be obtained by a polynomial time algorithm. We apply the algorithm to haplotype data of human chromosome 21 and compare the results with other previously developed methods through simulations. The results demonstrate that our algorithm outperforms the conventional implementation of the Four Gamete Test approach which is the only available method for haplotype block partitioning based on Perfect Phylogeny.
Effect of PITX2 knockdown on transcriptome of primary human trabecular meshwork cell cultures.
Paylakhi, SH., Fan, JB., Mehrabian, M., Sadeghizadeh, M., Yazdani, S., Katanforoush, A., Kanavi, MR., Ronaghi, M., Elahi, E.
(2011) Molecular Vision, 17, pp. 1209-1221.
Abstract
2009
Global haplotype partitioning for maximal associated SNP pairs
Katanforoush, A., Sadeghi, M., Pezeshk, H., Elahi, E.
(2009) BMC Bioinformatics, 10, art. no. 1471, p. 269.
[Abstract]
Background: Global partitioning based on pairwise associations of SNPs has not previously been used to define haplotype blocks within genomes. Here, we define an association index based on LD between SNP pairs. We use the Fisher's exact test to assess the statistical significance of the LD estimator. By this test, each SNP pair is characterized as associated, independent, or not- statistically-significant. We set limits on the maximum acceptable proportion of independent pairs within all blocks and search for the partitioning with maximal proportion of associated SNP pairs. Essentially, this model is reduced to a constrained optimization problem, the solution of which is obtained by iterating a dynamic programming algorithm. Results: In comparison with other methods, our algorithm reports blocks of larger average size. Nevertheless, the haplotype diversity within the blocks is captured by a small number of tagSNPs. Resampling HapMap haplotypes under a block-based model of recombination showed that our algorithm is robust in reproducing the same partitioning for recombinant samples. Our algorithm performed better than previously reported models in a case-control association study aimed at mapping a single locus trait, based on simulation results that were evaluated by a block-based statistical test. Compared to methods of haplotype block partitioning, we performed best on detection of recombination hotspots. Conclusion: Our proposed method divides chromosomes into the regions within which allelic associations of SNP pairs are maximized. This approach presents a native design for dimension reduction in genome-wide association studies. Our results show that the pairwise allelic association of SNPs can describe various features of genomic variation, in particular recombination hotspots.
2008
A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins
Sheari, A., Kargar, M., Katanforoush, A., Arab, S., Sadeghi, M., Pezeshk, H., Eslahchi, C., Marashi, S.-A.
(2008) BMC Bioinformatics, 9, art. no. 274
[Abstract]
Background: It has been previously shown that palindromic sequences are frequently observed in proteins. However, our knowledge about their evolutionary origin and their possible importance is incomplete. Results: In this work, we tried to revisit this relatively neglected phenomenon. Several questions are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes coincide with conserved protein sequences? If yes, what are the functions of these conserved segments? (3) In case of conserved palindromes, is it always the case that the whole conserved pattern is also symmetrical? (4) Do palindromic protein sequences form regular secondary structures? (5) Does sequence similarity of the two "sides" of a palindrome imply structural similarity? For the first question, we showed that the complexity of palindromic peptides is significantly lower than randomly generated palindromes. Therefore, one can say that palindromes occur frequently in low complexity protein segments, without necessarily having a defined function or forming a special structure. Nevertheless, this does not rule out the possibility of finding palindromes which play some roles in protein structure and function. In fact, we found several palindromes that overlap with conserved protein Blocks of different functions. However, in many cases we failed to find any symmetry in the conserved regions of corresponding Blocks. Furthermore, to answer the last two questions, the structural characteristics of palindromes were studied. It is shown that palindromes may have a great propensity to form α-helical structures. Finally, we demonstrated that the two sides of a palindrome generally do not show significant structural similarities. Conclusion: We suggest that the puzzling abundance of palindromic sequences in proteins is mainly due to their frequent concurrence with low-complexity protein regions, rather than a global role in the protein function. In addition, palindromic sequences show a relatively high tendency to form helices, which might play an important role in the evolution of proteins that contain palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant structural similarity. This observation rules out the importance of palindromes for forming symmetrical structures. Although palindromes frequently overlap with conserved Blocks, we suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with a certain structural fold or protein domain.
2007
Evolution of 'ligand-deffusion chreodes' on protein-surface models: A genetic-algorithm study
Marashi, S.-A., Kargar, M., Katanforoush, A., Abolhassani, H., Sadeghi, M.
(2007) Chemistry and Biodiversity, 4 (12), pp. 2766-2771.
[Abstract]
Lattice models have been previously used to model ligand diffusion on protein surfaces. Using such models, it has been shown that the presence of pathways ('chreodes') of consecutive residues with certain properties can decrease the number of steps required for the arrival of a ligand at the active site. In this work, we show that, based on a genetic algorithm, ligand-diffusion pathways can evolve on a protein surface, when this surface is selected for shortening the travel length toward the active site. Biological implications of these results are discussed.
Solvent accessibility, residue charge and residue volume, the three ingredients of a robust amino acid substitution matrix.
Goodarzi, H., Katanforoush, A., Torabi, N., Najafabadi, H.S.
(2007) Journal of Theoretical Biology, 245 (4), pp. 715-725.
[Abstract]
Cost measure matrices or different amino acid indices have been widely used for studies in many fields of biology. One major criticism of these studies might be based on the unavailability of an unbiased and yet effective amino acid substitution matrix. Throughout this study we have devised a cost measure matrix based on the solvent accessibility, residue charge, and residue volume indices. Performed analyses on this novel substitution matrix (i.e. solvent accessibility charge volume (SCV) matrix) support the uncontaminated nature of this matrix regarding the genetic code. Although highly similar to a number of previously available cost measure matrices, the SCV matrix results in a more significant optimality in the error-buffering capacity of the genetic code when compared to many other amino acid substitution matrices. Besides, a method to compare an SCV-based scoring matrix with a number of widely used matrices has been devised, the results of which highlights the robustness of this matrix in protein family discrimination.
2006
Detailed instantaneous ionization rate of H2+ in an intense laser field
Vafaee, M., Sabzyan, H., Vafaee, Z., Katanforoush, A.
(2006) Physical Review A 74, 043416.
Abstract
2003
Distributing Points on the Sphere, I
Katanforoush, A., Shahshahani, M.
(2003) Experimental Mathematics, 12 (2), pp. 199-209.
[Abstract]
We study four different methods for distributing points on the sphere and numerically analyze their relative merits with respect to certain metrics.
Conference Papers
2012
Classification of Gene Expression Data using Multiple Ranker Evaluators and Neural Network.
Roozbahani, Z., Katanforoush, A.
CICIS'12, 3rd Int. Conf. Contemporary Issues in Computer and Information Sciences, May 2012, IASBS, Zanjan, Iran
[Abstract]
Samples assayed by high-throughput microarray technologies challenge conventional Machine Learning techniques. The major issue is the number of attributes (genes) that is highly greater than the number of samples. In feature selection, we attempt to reduce the number of attributes to obtain the most effective genes. In the prediction scheme introduced in this paper, several feature selection methods are combined with an Artificial Neural Network (ANN) classifier. Initially, we exploit various evaluators to measure association between the gene expression rate and susceptibly categories. Then we rank genes based on each of measures and select a fixed number of top rank genes. To assess the performance of this method, we use a Multi Layer Perceptron (MLP) in which the input layer is associated to the genes commonly selected by all evaluators. We consider gene expression samples for Leukemia, Lymphoma and DLBCL to evaluate our method using Leave-one-out cross validation. Results show that our approach outperforms the predictive accuracy compared to other methods.
2010
A Genetic Algorithm for Haplotype Inference. (In Persian)
Katanforoush, A. Sadeghi, M., Eslahchi, Ch., Pezeshk, H.
10th Iranian Conference on Fuzzy System (IFS10), Shahid Beheshti University, July 2010.
[Abstract]
2008
Haplotype Block Partitioning and TagSNP Selection using Mutual Information. (Poster)
Katanforoush A., Sadeghi M., Pezeshk H.
ECCB'08, European Conference on Computational Biology, Sep 2008, Cagliari, Italy
[Abstract]
Most experimental and computational studies on high throughput SNP data are applicable only on a limited region of genome or limited subset of SNPs. Using mutual information of SNP pairs, we define the regions of high information consistency as haplotype blocks. Applying the dynamic programming approach with the proposed score on HapMap samples results wider chromosomal blocks than the ones by other works while admitting the hotspots of recombination. Also using an approach of the problem of dominating set, we develop an algorithm for tagSNP selection that outperforms on association studies.
Dissertations
2009
Computational Problems in Haplotype Recognition.
A. Katanforoush
Ph.D. Dissertation, Institute of Biochemistry and Biophysics, University of Tehran, Oct 2009.
Full dissertation: [PDF (in Persian)] [LaTeX sources (XePersian) ] Presentation slides: [PDF] [Open Office]
[Abstract]
Haplotypes, as the SNP genotypes in haploid phase, provide useful materials for genetic analyses. Two primary processes in computational haplotype study are to phase genotype data into haplotypes and to partition a chromosome into blocks based on haplotype samples of unrelated individuals. For problem of genotype phasing, we introduce a family of greedy procedures as a general approach finding nearly the best optimal solutions for the phasing problem under the model of maximum parsimony. Then we develop a hybrid method to infer haplotypes from genotype data by incorporating the greedy procedure into a Genetic Algorithm. Global partitioning based on pairwise associations of SNPs has not previously been used to define haplotype blocks within genomes. Here, we define an association index based on LD between SNP pairs. We set limits on the maximum acceptable proportion of independent pairs within all blocks and search for the partitioning with maximal proportion of associated SNP pairs. Essentially, this model is reduced to a constrained optimization problem, the solution of which is obtained by iterating a dynamic programming algorithm. We also introduce new assessment protocols to evaluate performance of haplotype block partitioning methods for different aspects and applications, including a definition of similarity for two block partitionings, a simulation process to assess the robustness of a block definition, the application in hotspots detection and an application in disease association studies. Resampling HapMap haplotypes under a block-based model of recombination shows that our algorithm is robust in reproducing the same partitioning for recombinant samples. Our algorithm performed better than previously reported models in a case-control association study aimed at mapping a single locus trait. Compared to methods of haplotype block partitioning, our algorithm performs best on detection of recombination hotspots.
1999
Knot removal for B-spline curves
A. Katanforoush
Master Thesis, Department of Mathematical Sciences, Sharif University of Technology, Sep 1999. (In Persian)
[PostScript] [FarsiTex sources]
[Abstract]
In this thesis, the problem of removing one inner knot from the knot sequence of a B-spline curve is discussed. The purpose of knot removal for a B-spline curve is to obtain a new B-spline curve defined on the knot sequence from which some knots are removed out of an initial sequence, so that the curve has the least difference with the initial curve with respect to some norms. By implementing some composition and decomposition algorithms of B-spline, based on the methods of removing a single knot, some applications can be considered such as the problem of knot ranking for joint removal of many knots as a special application, and also many practical applications like curve and surface fiiting, signal processing, image compression and approximation of functions out of spline space with free-knot sequence.