Complexity of Context-free Grammars with Exceptions, and the inadequacy of grammars as models for XML and SGML

Rizzi, Romeo (2002) Complexity of Context-free Grammars with Exceptions, and the inadequacy of grammars as models for XML and SGML. UNSPECIFIED.

Download (233Kb) | Preview


    The Standard Generalized Markup Language (SGML) and the Extensible Markup Language (XML) allow authors to better transmit the semantics in their documents by explicitly specifying the relevant structures in a document or class of documents by means of document type definitions (DTDs. Several authors have proposed to regard DTDs as extended context-free grammars expressed in a notation similar to extended Backus--Naur form. In addition, the SGML standard allows the semantics of content models (the right-hand side of productions) to be modified by exceptions. Inclusion exceptions allow named elements to appear anywhere within the content of a content model, and exclusion exceptions preclude named elements from appearing in the content of a content model. Since XML does not allow exceptions, the problem of exception removal has received much interest recently. Motivated by this, Kilpelainen and Wood have proved that exceptions do not increase the expressive power of extended context-free grammars and that for each DTD with exceptions, we can obtain a structurally equivalent extended context-free grammar. Since their argument was based on an exponential simulation, they also conjectured that an exponential blow-up in the size of the grammar is a necessary devil when purging exceptions away. We prove their conjecture under the most realistic assumption that NP-complete problems do not admit non-uniform polynomial-time algorithms. Kilpelainen and Wood also asked whether the parsing problem for extended context-free grammars with exceptions admits efficient algorithmic solution. We show the NP-completeness of the very basic problem: given a string w and a context-free grammar G (not even extended) with exclusion exceptions (no inclusion exceptions needed), decide whether w belongs to the language generated by G. Our results and arguments point up the limitations of using extended context-free grammars as a model of SGML, especially when one is interested in understanding issues related to exceptions.

    Item Type: Departmental Technical Report
    Department or Research center: Information Engineering and Computer Science
    Subjects: Q Science > QA Mathematics > QA075 Electronic computers. Computer science
    Uncontrolled Keywords: exceptions, context-free grammars, computational complexity, exponential blow-up, XML, SGML.
    Additional Information: Published in: "Markup Languages: Theory and Practice" 3 (1) 2001.
    Report Number: DIT-02-058
    Repository staff approval on: 21 Jan 2003

    Actions (login required)

    View Item