TOOLS FOR PREDICTION AND ANALYSIS OF PROTEIN-CODING GENE STRUCTURE
Hamming-Clustering method for TATA-box signals prediction of eukaryotic genes
Gene expression is regulated by different kinds of short nucleotide domains. These features can either activate or terminate the transcription process. To predict the signal sites in the 5' gene regions we applied the Hamming-Clustering network (HC) to the TATA-box, to the transcription initiation site detemination in DNA sequences. This approach employs a technique deriving from the synthesis of digital networks in order to generate prototypes, or rules, which can be directly analyzed or used for the construction of a final neural network
A full set of the Eukaryotic genes (1252 entry) from the Eukaryotic Promoter Database (EPD rel. 42) have been used for the TATA-box signal and transcription initiation site training. A set of eukaryotic plant genes have been used to test the validity of the Hamming-Clustering network approach. The results show the applicability of the Hamming-Clustering method to functional signal prediction.
Milanesi, L., Muselli M., Arrigo P (1996) Hamming Clustering method for signals prediction in 5' and 3' regions of eukaryotic genes. Comput. Applic. Biosci, 12 (5) p399-404 (1996)
Milanesi, L., Arrigo P, Muselli M. (1995) Recognition of Poly-A signals with Hamming Clustering.In: "Proceedings of the Third International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis" (H.A. Lim, J.W. Fickett, C.R. Cantor and R.J. Robbins, eds.), World Scientific Publishing, Singapore, pp. 461-466.
Milanesi L. and Rogozin I.B. Prediction of human gene structure. In: Guide to Human Genome Computing (2nd ed.) (Ed. M.J.Bishop) Academic Press, Cambridge, 1998, 215-259.
Milanesi L., D'Angelo D., Rogozin I.B. GeneBuilder: interactive in silico prediction of genes structure. Bioinformatics, 1999, (in press - BIO98N149).