Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A hybrid technique for estimating frequency moments over data streams
#1

Abstract.
The problem of estimating the kth frequency moment Fk, for any
non-negative integral value of k, over a data stream by looking at the items exactly
once as they arrive, was considered in a seminal paper by Alon, Matias and
Szegedy [1, 2]. They present a sampling based algorithm to estimate Fk where,
k 2, using space O(n1 1/k)). Coppersmith and Kumar [7] and [10], using
different methods, present algorithms for estimating Fk with space complexity
O
(n1 1/(k 1)). In this paper, we present an algorithm for estimating Fk with
space complexity O(n1 2/(k+1)), for k > 2, thereby, improving the space complexity
compared to the algorithms in [1, 2, 7, 10] for k 4.
1 Introduction
Data streams are characterized by large volumes of data that arrive rapidly and continuously.
Due to the volume of the data, it is desirable to design algorithms that estimate
metrics over the data streams, with sub-linear space complexity. One such metric is
frequency moments, studied in a seminal paper by Alon, Matias and Szegedy [1, 2]. In
this paper, we study and present novel algorithms for this problem.
We view a stream as a sequence of arrivals of items l, where, l is the identity of
an item. The items are assumed to draw their identities from the domain [N]. The frequency
of an item with identity l, denoted by fl, is the number of occurrences of l since
the inception of the stream. Thus, each arrival of an item increases its frequency by
1. The kth frequency moment of the stream, denoted by Fk, is defined as
P
l fk
l , for
k > 2 and k integral. This problem was first studied in a seminal paper by Alon, Matias
and Szegedy [1, 2]. It is interesting for several reasons. Historically, it is one of the very
first problems studied in the data streaming model and helped to consolidate the area
of data streaming algorithms. Secondly, the techniques presented in [1, 2] have led to
new solution techniques for several practical problems on data streams, for example,
maintenance of histograms [12] and maintenance of the top-k frequent items [6].
1.1 Prior Work
The kth moment estimation algorithm, presented in [1, 2] (which we call the AMS algorithm)
is based on sampling and estimates Fk, for k 2, to within any specified
approximation parameter 0 < < 1, and with confidence exceeding a userspecified
parameter < 1. The space complexity of the AMS algorithm

Download full report
http://googleurl?sa=t&source=web&cd=1&ve...bridFk.pdf&ei=k8dETtnnHYHKrAfI3qHwAw&usg=AFQjCNHXqq-ePQmxocd3piP1Mtcx8in9fA&sig2=fcsVH2cbDst9pWdOjdziYA
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Powered By MyBB, © 2002-2024 iAndrew & Melroy van den Berg.