Automatic Prediction of Functional Residues from Sequence and Structural Information

Cilia, Elisa and Passerini, Andrea and Brunato, Mauro (2008) Automatic Prediction of Functional Residues from Sequence and Structural Information. UNSPECIFIED. (Unpublished)

Download (2134Kb) | Preview


    One of the aims of modern bioinformatics is to discover the molecular mechanisms which rule the protein operation. This is a fundamental step in understanding the complex processes involved in living systems and would possibly allow us to correct dysfunctions. A protein may have different functions. Such functions are determined by the protein primary structure, i.e. the sequence of amino acids that constitute it, and by their spatial disposition (tertiary structure). Protein function identification is a challenging problem as it involves the combination of a large number of variables, most of which still unknown. Therefore approaches for the automatic detection of protein functional sites are needed. In this work we concentrate on the prediction of functional residues, i.e. residues which directly interact with the substrate. In its simpler formulation, the problem can be cast into a binary classification task at the residue level. Preliminary experiments showed that evolutionary enriched sequence-based information alone achieve performance which are statistically indistinguishable from carefully crafted features extracted from 3D coordinates. While allowing to apply functional residue prediction to the much wider range of sequenced proteins with possibly unknown 3D structure, such results indicate that using structural information in the automatic prediction of protein functional aspects is a non-trivial task. We show that by modeling physico-chemical properties of the residue structural neighbourhood we obtain significant improvements, but further research is needed in order to fully exploit the information provided by the protein 3D structure.

    Item Type: Departmental Technical Report
    Department or Research center: Information Engineering and Computer Science
    Subjects: Q Science > QA Mathematics > QA075 Electronic computers. Computer science
    Q Science > QH Natural history > QH301 Biology
    Uncontrolled Keywords: catalytic residue prediction, conservation profiles, support vector machine, protein active site, protein structure
    Report Number: DISI-08-036
    Repository staff approval on: 19 Sep 2008

    Actions (login required)

    View Item