Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Lower bounds on frequency estimation of data streams
#1

Abstract.
We consider a basic problem in the general data streaming model, namely, to estimate
a vector f 2 Zn that is arbitrarily updated (i.e., incremented or decremented) coordinatewise.
The estimate f 2 Zn must satisfy k f fk1 _ _kfk1, that is, 8i ( fi fi _ _kfk1). It is
known to have O(_ 1) randomized space upper bound [4], (_ 1 log(_n)) space lower bound
[2] and deterministic space upper bound of (_ 2) bits.1 We show that any deterministic
algorithm for this problem requires space (_ 2(logkfk1)) bits.
1 Introduction
A data stream _ over the domain [1, n] = {1, 2, . . . , n} is modeled as a sequence of records of
the form (pos, i, _v), where, pos is the current sequence index, i 2 [1, n] and _v 2 {+1, 1}.
Here, _v = 1 signifies an insertion of an instance of i and _v = 1 signifies a deletion
of an instance of i. For each data item i 2 [1, n], P its frequency (freq _)i is defined as
(pos,i,_v) 2 stream _v. The size of _ is defined as _ = max{kfreq _0k1 _0 prefix of _}. In
this paper, we consider the general stream model, where, the n-dimensional frequency vector
freq _ 2 Zn. The data stream model of processing permits online computations over the
input sequence using sub-linear space. The data stream computation model has proved to
be a viable model for a number of application areas, such as network monitoring, databases,
financial data processing, etc..
We consider the problem ApproxFreq(_): given a data stream _, return f, such that
err( f, freq _) _ _, where, the function err is given by (1). Equivalently, the problem may
be formulated as: given i 2 [1, n], return fi such that fi (freq _)i _ _ kfreq _k1, where,
kfk1 =
P
i2[1,n] fi .
err( f, f) def = k f fk1
kfk1
_ _ . (1)
The problem ApproxFreq(_) is of fundamental interest in data streaming applications.
For general streams, this problem is known to have a space lower bound of (_ 1 log(n_))
[2], a randomized space upper bound of O (_ 1) [4], and a deterministic space upper bound
of O(_ 2) bits [7]. For insert-only streams (i.e., freq _ _ 0), there exist deterministic algorithms
that use O((_ 1)(log(mn)) space [5, 11, 12]; however extensions of these algorithms
to handle deletions in the stream are not known.
Mergeability. Data summary structures for summarizing data streams for frequency dependent
computations (e.g., approximate frequent items, frequency moments, etc.; formally
defined in Section 2) typically exhibit the property of arbitrary mergeability. If D is a data
structure for processing a stream and Dj , j = 1, . . . , k for k arbitrary, be the respective current
state of the structure after processing streams Sj , then, there exists a simple operation
Merge such that Merge(D1, . . . ,Dk) reconstructs the state of D that would be obtained

Download full report
http://googleurl?sa=t&source=web&cd=1&ve...r-full.pdf&ei=57BETr22GYyHrAec7P3ZAw&usg=AFQjCNE-4MPxyeO9sHG39a5uKQ0cArlbKQ&sig2=qmJbJyAylJMe08MNVemReA
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Powered By MyBB, © 2002-2024 iAndrew & Melroy van den Berg.