#
Publications

2011

Eslahchi, Ch., Katanforoush, A., Pezeshk, H., Afzaly, N.

(2011) Iranian Journal of Biotechnology, 9 (4), pp. 281-289.

Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome. Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Particular pattern of these common variations forms a block-like structure on human genome. In this work, we develop a new method based on the Perfect Phylogeny Model to identify haplotype blocks using samples of individual genomes. We give a rigorous definition of the quality of the partitioning of haplotypes into blocks and describe a greedy algorithm for finding the proper partitioning in case of perfect and semi-perfect phylogeny. It is shown that the minimum number of tagSNPs in a haplotype block of Perfect Phylogeny can be obtained by a polynomial time algorithm. We apply the algorithm to haplotype data of human chromosome 21 and compare the results with other previously developed methods through simulations. The results demonstrate that our algorithm outperforms the conventional implementation of the Four Gamete Test approach which is the only available method for haplotype block partitioning based on Perfect Phylogeny.

Paylakhi, SH., Fan, JB., Mehrabian, M., Sadeghizadeh, M., Yazdani, S., Katanforoush, A., Kanavi, MR., Ronaghi, M., Elahi, E.

(2011) Molecular Vision, 17, pp. 1209-1221.

2009

Katanforoush, A., Sadeghi, M., Pezeshk, H., Elahi, E.

(2009) BMC Bioinformatics, 10, art. no. 1471, p. 269.

**Background:** Global partitioning based on pairwise associations of SNPs has not previously been
used to define haplotype blocks within genomes. Here, we define an association index based on LD
between SNP pairs. We use the Fisher's exact test to assess the statistical significance of the LD
estimator. By this test, each SNP pair is characterized as associated, independent, or not-
statistically-significant. We set limits on the maximum acceptable proportion of independent pairs
within all blocks and search for the partitioning with maximal proportion of associated SNP pairs.
Essentially, this model is reduced to a constrained optimization problem, the solution of which is
obtained by iterating a dynamic programming algorithm.
**Results:** In comparison with other methods, our algorithm reports blocks of larger average size.
Nevertheless, the haplotype diversity within the blocks is captured by a small number of tagSNPs.
Resampling HapMap haplotypes under a block-based model of recombination showed that our
algorithm is robust in reproducing the same partitioning for recombinant samples. Our algorithm
performed better than previously reported models in a case-control association study aimed at
mapping a single locus trait, based on simulation results that were evaluated by a block-based
statistical test. Compared to methods of haplotype block partitioning, we performed best on
detection of recombination hotspots.
**Conclusion:** Our proposed method divides chromosomes into the regions within which allelic
associations of SNP pairs are maximized. This approach presents a native design for dimension
reduction in genome-wide association studies. Our results show that the pairwise allelic association
of SNPs can describe various features of genomic variation, in particular recombination hotspots.

2008

Sheari, A., Kargar, M., Katanforoush, A., Arab, S., Sadeghi, M., Pezeshk, H., Eslahchi, C., Marashi, S.-A.

(2008) BMC Bioinformatics, 9, art. no. 274

**Background:** It has been previously shown that palindromic sequences are frequently observed
in proteins. However, our knowledge about their evolutionary origin and their possible importance
is incomplete.
**Results:** In this work, we tried to revisit this relatively neglected phenomenon. Several questions
are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in
low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of
sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes
coincide with conserved protein sequences? If yes, what are the functions of these conserved
segments? (3) In case of conserved palindromes, is it always the case that the whole conserved
pattern is also symmetrical? (4) Do palindromic protein sequences form regular secondary
structures? (5) Does sequence similarity of the two "sides" of a palindrome imply structural
similarity? For the first question, we showed that the complexity of palindromic peptides is
significantly lower than randomly generated palindromes. Therefore, one can say that palindromes
occur frequently in low complexity protein segments, without necessarily having a defined function
or forming a special structure. Nevertheless, this does not rule out the possibility of finding
palindromes which play some roles in protein structure and function. In fact, we found several
palindromes that overlap with conserved protein Blocks of different functions. However, in many
cases we failed to find any symmetry in the conserved regions of corresponding Blocks.
Furthermore, to answer the last two questions, the structural characteristics of palindromes were
studied. It is shown that palindromes may have a great propensity to form α-helical structures.
Finally, we demonstrated that the two sides of a palindrome generally do not show significant structural similarities.
**Conclusion:** We suggest that the puzzling abundance of palindromic sequences in proteins is
mainly due to their frequent concurrence with low-complexity protein regions, rather than a global
role in the protein function. In addition, palindromic sequences show a relatively high tendency to
form helices, which might play an important role in the evolution of proteins that contain
palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant
structural similarity. This observation rules out the importance of palindromes for forming
symmetrical structures. Although palindromes frequently overlap with conserved Blocks, we
suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with
a certain structural fold or protein domain.

2007

Marashi, S.-A., Kargar, M., Katanforoush, A., Abolhassani, H., Sadeghi, M.

(2007) Chemistry and Biodiversity, 4 (12), pp. 2766-2771.

Lattice models have been previously used to model ligand diffusion on protein surfaces. Using such
models, it has been shown that the presence of pathways ('chreodes') of consecutive residues with
certain properties can decrease the number of steps required for the arrival of a ligand at the active site.
In this work, we show that, based on a genetic algorithm, ligand-diffusion pathways can evolve on a
protein surface, when this surface is selected for shortening the travel length toward the active site.
Biological implications of these results are discussed.

Goodarzi, H., Katanforoush, A., Torabi, N., Najafabadi, H.S.

(2007) Journal of Theoretical Biology, 245 (4), pp. 715-725.

Cost measure matrices or different amino acid indices have been widely used for studies in many fields of biology. One major criticism
of these studies might be based on the unavailability of an unbiased and yet effective amino acid substitution matrix. Throughout this
study we have devised a cost measure matrix based on the solvent accessibility, residue charge, and residue volume indices. Performed
analyses on this novel substitution matrix (i.e. solvent accessibility charge volume (SCV) matrix) support the uncontaminated nature of
this matrix regarding the genetic code. Although highly similar to a number of previously available cost measure matrices, the SCV
matrix results in a more significant optimality in the error-buffering capacity of the genetic code when compared to many other amino
acid substitution matrices. Besides, a method to compare an SCV-based scoring matrix with a number of widely used matrices has been
devised, the results of which highlights the robustness of this matrix in protein family discrimination.

2006

Vafaee, M., Sabzyan, H., Vafaee, Z., Katanforoush, A.

(2006) Physical Review A 74, 043416.

2003

Katanforoush, A., Shahshahani, M.

(2003) Experimental Mathematics, 12 (2), pp. 199-209.

We study four different methods for distributing points on
the sphere and numerically analyze their relative merits with
respect to certain metrics.

2012

Roozbahani, Z., Katanforoush, A.

CICIS'12, 3rd Int. Conf. Contemporary Issues in Computer and Information Sciences, May 2012, IASBS, Zanjan, Iran

Samples assayed by high-throughput microarray technologies challenge conventional Machine Learning techniques. The major issue is the number of attributes (genes) that is highly greater than the number of samples. In feature selection, we attempt to reduce the number of attributes to obtain the most effective genes. In the prediction scheme introduced in this paper, several feature selection methods are combined with an Artificial Neural Network (ANN) classifier. Initially, we exploit various evaluators to measure association between the gene expression rate and susceptibly categories. Then we rank genes based on each of measures and select a fixed number of top rank genes. To assess the performance of this method, we use a Multi Layer Perceptron (MLP) in which the input layer is associated to the genes commonly selected by all evaluators. We consider gene expression samples for Leukemia, Lymphoma and DLBCL to evaluate our method using Leave-one-out cross validation. Results show that our approach outperforms the predictive accuracy compared to other methods.

2010

Katanforoush, A. Sadeghi, M., Eslahchi, Ch., Pezeshk, H.

10th Iranian Conference on Fuzzy System (IFS10), Shahid Beheshti University, July 2010.

2008

Katanforoush A., Sadeghi M., Pezeshk H.

ECCB'08, European Conference on Computational Biology, Sep 2008, Cagliari, Italy

Most experimental and computational studies on high throughput SNP data are applicable only on a limited region of genome or limited subset of SNPs. Using mutual information of SNP pairs, we define the regions of high information consistency as haplotype blocks. Applying the dynamic programming approach with the proposed score on HapMap samples results wider chromosomal blocks than the ones by other works while admitting the hotspots of recombination. Also using an approach of the problem of dominating set, we develop an algorithm for tagSNP selection that outperforms on association studies.

2009

A. Katanforoush

Ph.D. Dissertation, Institute of Biochemistry and Biophysics, University of Tehran, Oct 2009.

Haplotypes, as the SNP genotypes in haploid phase, provide useful materials for genetic analyses. Two primary processes in computational haplotype study are to phase genotype data into haplotypes and to partition a chromosome into blocks based on haplotype samples of unrelated individuals.
For problem of genotype phasing, we introduce a family of greedy procedures as a general approach finding nearly the best optimal solutions for the phasing problem under the model of maximum parsimony. Then we develop a hybrid method to infer haplotypes from genotype data by incorporating the greedy procedure into a Genetic Algorithm.
Global partitioning based on pairwise associations of SNPs has not previously been used to define haplotype blocks within genomes. Here, we define an association index based on LD between SNP pairs. We set limits on the maximum acceptable proportion of independent pairs within all blocks and search for the partitioning with maximal proportion of associated SNP pairs. Essentially, this model is reduced to a constrained optimization problem, the solution of which is obtained by iterating a dynamic programming algorithm.
We also introduce new assessment protocols to evaluate performance of haplotype block partitioning methods for different aspects and applications, including a definition of similarity for two block partitionings, a simulation process to assess the robustness of a block definition, the application in hotspots detection and an application in disease association studies.
Resampling HapMap haplotypes under a block-based model of recombination shows that our algorithm is robust in reproducing the same partitioning for recombinant samples. Our algorithm performed better than previously reported models in a case-control association study aimed at mapping a single locus trait. Compared to methods of haplotype block partitioning, our algorithm performs best on detection of recombination hotspots.

1999

A. Katanforoush

Master Thesis, Department of Mathematical Sciences, Sharif University of Technology, Sep 1999. *(In Persian)*

In this thesis, the problem of removing one inner knot from the knot
sequence of a B-spline curve is discussed.
The purpose of knot removal for a B-spline curve is to obtain a new B-spline
curve defined on the knot sequence from which some knots are removed
out of an initial sequence, so that the curve has the least difference
with the initial curve with respect to some norms.
By implementing some composition and decomposition algorithms of B-spline,
based on the methods of removing a single knot, some applications can be
considered such as the problem of knot ranking for joint removal of many knots
as a special application, and also many practical applications like curve
and surface fiiting, signal processing, image compression and approximation
of functions out of spline space with free-knot sequence.