Defining and Measuring Data-Driven Quality Dimension of Staleness

Chayka, Oleksiy and Palpanas, Themis and Bouquet, Paolo (2012) Defining and Measuring Data-Driven Quality Dimension of Staleness. Trento : Università degli Studi di Trento.

[img]
Preview
PDF
Download (882Kb) | Preview

    Abstract

    With the growing complexity of data acquisition and processing methods, there is an increasing demand in understanding which data is outdated and how to have it as fresh as possible. Staleness is one of the key, time-related, data quality characteristics, that represents a degree of synchronization between data originators and information systems possessing the data. However, nowadays there is no common and pervasive notion of data staleness, as well as methods for its measurement in a wide scope of applications. Our work provides a definition of a data-driven notion of staleness for information systems with frequently updatable data. For such a data, we demonstrate an efficient exponential smoothing method of staleness measurement, compared to naïve approaches, using the same limited amount of memory, based on averaging of frequency of updates. We present experimental results of staleness measurement algorithms that we run on history of updates of articles from Wikipedia.

    Item Type: Departmental Technical Report
    Department or Research center: Information Engineering and Computer Science
    Subjects: Q Science > QA Mathematics > QA075 Electronic computers. Computer science
    Q Science > QA Mathematics > QA076 Computer software
    Q Science > QA Mathematics > QA076 Data Base Management
    T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7885 Computer Engineering
    Report Number: DISI-12-016
    Repository staff approval on: 24 Apr 2012 10:45

    Actions (login required)

    View Item