Approximate query answering in sensor networks
  

There has been a great deal of interest in recent years in  developing systems to collect data from wireless sensor networks.
Applications include environmental monitoring, agriculture, industrial monitoring, and process control. Existing systems for
declarative querying in these environments typically collect data from all of the nodes at a regular rate to some sink node, where it
is combined and processed just as in a standard streaming database system. Clearly, this approach involves a large number of
transmissions and it is costly from the energy viewpoint.

We have studied the problem of efficiently providing approximate answers to user queries at the sink [1,2,3,4]. We develop a suite of novel techniques for predicting values sensed at the nodes, and for grouping together sensor nodes that produce similar data and for using representative members of each group to answer queries for the other constituents. Our main focus is on designing an adaptable query framework that is energy-efficient and that provides user-specifiable approximation bounds. As with most work on sensor networks, we focus on minimizing the amount of communication, as message transmission tends to be the dominant energy cost in most sensor systems.
Our approach is probabilistic: the sink does not forward the query requests submitted by the end user to each node in the
network, but answers queries using a global probabilistic model built at the sink to predict sensor readings. The global model is
derived from local models built at each node and transmitted to the sink. Each node monitors the quality of its data model and adapts it when needed by notifying the sink. Our approach, called SAF (for Similarity-based Adaptive Framework), uses time series
forecasting to dramatically reduce the number of transmissions relative to previous approximate approaches. The time series models
we use are simple linear models that are cheap to learn and easy to adapt, in the sense that the model can be frequently re-learned as
the data distribution changes. Our experimental results show that despite its simplicity our adaptable model is capable of predicting
data produced by real-world sensors measuring physical phenomena like ambient light, temperature, and humidity that evolve slowly
over time. In order to answer queries satisfying a certain user confidence, nodes have to periodically transmit their readings to the sink. SAF additionally reduces the amount of communication by organizing the network into clusters based on data similarity between nodes. Our novel definition of data similarity uses coefficients of local AR models stored at the sink rather the actual readings, which reduces energy consumption over techniques that directly compare data values. We derive an efficient clustering algorithm that is provably optimal in the number of clusters formed by the network. This is achieved by exploiting the linearity of the AR models, which allows us to transform the complex problem of computing and maintaining clusters into the simple problem of
grouping intervals of same width into larger intervals. Finally, SAF provides provably correct error bounds and allows the user to
dynamically tune answer quality to answer queries in an energy and resource efficient manner. This includes the possibility of tuning
the data stream rate at the sensor according to the user requirements. The high flexibility of SAF  can lead to further
extensions for additionally reducing transmissions. We validate the accuracy and the stability of our
models and clusters both analytically and through extensive experimental results based on traces of real data, which show their suitability.
SAF can be applied to environmental applications in which the sink analyzes and synthesizes the data stream produced at sensors, to
tracking applications in which outlier values detected at each sensor, or variations in the local data distribution can show the
presence of an object.

[1]  D. Tulone, S. Madden. PAQ: Time series forecasting for approximate query answering
in sensor networks. In Proc. of the 3rd European Workshop on Wireless Sensor Networks, pp. 21-37, Feb 2006.

[2]  D. Tulone, S. Madden. An energy-efficient querying framework in sensor networks for detecting node similarities.
Submitted to conference.

[3] D. Tulone, S. Madden. An energy-efficient querying framework in sensor networks for detecting node similarities.
Submitted to journal.

[4] D. Tulone. Mechanisms for energy conservation in wireless sensor networks. Ph.D. thesis, Department of Computer Science, Univeristy of Pisa, December 2005.