Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Simpler algorithm for estimating frequency moments of data streams
#1

Abstract
The problem of estimating the kth frequency moment Fk over a data stream by looking at the items exactly once as they arrive was posed in [1, 2]. A succession of algorithms have been proposed for this problem [1, 2, 6, 8, 7]. Recently, Indyk and Woodru [11] have presented the rst algorithm for estimating Fk, for k > 2, using space O (n1 2=k), matching the space lower bound (up to poly-logarithmic factors) for this problem [1, 2, 3, 4, 13] (n is the number of distinct items occurring in the stream.) In this paper, we present a simpler 1-pass algorithm for estimating Fk.
1 Introduction
Data streaming systems have many natural applica- tions, such as database systems, network monitoring, sensor networks, RF-id data management, etc.. These applications are characterized by rapidly arriving vo- luminous data which makes it di cult to store the data in its entirety for either online processing or post- processing. Therefore, there has been a substantial interest in the design of algorithms that process data streams using single-pass (or online) algorithms that re- quire sub-linear space. We view a data stream as a sequence of arrivals of the form (i; v), where, i is the identity of an item that is a member of the domain D = f0; 1; : : : ;N 1g, and v is the change in the frequency of the item. The value v may be either greater than or less than zero; v 1 signi es v insertions of the item i, and v 1 signi es v deletions of i. The frequency of an item i is denoted by fi and is de ned as the sum of the changes to its frequency since the inception of the stream, that is, fi = P (i;v) appears in stream v. The kth frequency moment of the stream, denoted by Fk, is de ned as Fk = P i fk i . The problem is to design an algorithm, parameterized by accuracy parameter and con dence parameter that returns an estimate ^ Fk of Fk, for k > 2, in a single pass over the data stream, such that, Pr n j ^ Fk Fkj Fk o 1 . Let n denote the number of items in the stream with positive frequencies and let m denote P i2D fi. The problem was introduced in [1, 2] who also present the rst sub-linear space algorithm with space complexity O (n1 1=k). (We say that f(n) is O (g(n)) if f(n) = O 1 O(1) (logm)O(1)(log n)O(1)g(n) .) [6, 8] present algorithms with space complexity O (n1 1=(k 1)) and [7] presents an algorithm with space complexity O (n1 2=(k+1)). A space lower bound of (n1 2 k ) is shown for this problem in a series of contributions [1, 2, 3, 4] (see[13] for a closely related problem). Recently, Indyk and Woodru [11] have presented the rst algorithm for estimating Fk using space O(n1 2=k), matching the space lower bound (up to poly-logarithmic factors) for this problem [4]. The space complexity of the algorithm of Indyk and Woodru has high constants and poly-logarithmic factors. Speci cally, their 2-pass algorithm has space complexity of O( 1 12 n1 2 k (log2 n)(log6m)). The 1-pass algorithm that is derived from this algorithm further multiplies the constant and poly-logarithmic factors. In this paper, we present a simpler algorithm for estimating Fk, for k > 2, whose space complexity is O( k2 2+4=k n1 2 k (log2m)(logm + log n)). Broadly speak- ing, we use the seminal idea of Indyk and Woodru [11] to classify items into groups, based on frequency. How- ever, Indyk and Woodru de ne groups whose bound- aries are randomized; this technique contributes to the complexity of their algorithm. In our algorithm, the group boundaries are deterministic. Our analysis is sim- pler and uses the more traditional approach of directly calculating the expectation and the variance of the es- timator for Fk. Finally, our algorithm is naturally a one-pass algorithm. The remainder of the paper is organized as follows. In Section 2, we brie y review the Countsketch algorithm[5] and the estimator for the residual second moment [9]. The data structure for the Fk estimator is presented in Section 3. The analysis of the estimator is presented in Section 4. Finally, we conclude in Section 6.

Download full report
http://googleurl?sa=t&source=web&cd=1&ve...soda06.pdf&ei=38BETuGmCcyurAfHvMzpAw&usg=AFQjCNFdzWexPT610Jv8sFQnZx4CmA2waQ&sig2=hQOTJ1DV81DzCZ0DjavQGA
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Powered By MyBB, © 2002-2024 iAndrew & Melroy van den Berg.