Robustness and Statistical Significance of Pam-like Matrices for Cognate Identification

Delmestri, Antonella and Cristianini, Nello (2010) Robustness and Statistical Significance of Pam-like Matrices for Cognate Identification. UNSPECIFIED. (Submitted)

Download (595Kb) | Preview


    This paper tests the influence of the training dataset dimension on a recently proposed orthographic learning system, inspired from biological sequence analysis and successfully applied to cognate identification. This system automatically aligns a given set of cognate pairs producing a meaningful training dataset, learns from it substitution parameters using a PAM-like technique and utilises them to recognise cognate pairs. The results show no difference in the performance when training the system with about 650 cognate pairs extracted from 6 Indo-European languages or with about 62,000 cognate pairs extracted from 76 Indo-European languages. In both cases the system outperforms all comparable orthographic and phonetic methods previously proposed in the literature. This paper also investigates the statistical significance of these results when compared with earlier proposals. The outcome confirms that the performance reached by this system with both training datasets is significantly higher than the one achieved by all the other methods. Indeed, the training dataset dimension seems not to influence either the accuracy or the statistical significance of this learning system that needs only a very small amount of data to reach an outstanding performance.

    Item Type: Departmental Technical Report
    Department or Research center: Information Engineering and Computer Science
    Subjects: Q Science > QA Mathematics > QA075 Electronic computers. Computer science
    Q Science > QA Mathematics > QA076 Computer software
    Uncontrolled Keywords: Cognate identification, substitution matrices, string similarity measures
    Report Number: DISI-10-048
    Repository staff approval on: 27 Oct 2010

    Actions (login required)

    View Item