A System for Watermarking Relational Databases
Rakesh Agrawal, Peter J. Haas, et al.
SIGMOD 2003
To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporallybiased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying exponentially over time. We then periodically retrain the models on the current sample. We provide and analyze both a simple sampling scheme (T-TBS) that probabilistically maintains a target sample size and a novel reservoirbased scheme (R-TBS) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequalprobability sampling schemes. We discuss distributed implementation strategies; experiments in Spark show that our approach can increase machine learning accuracy and robustness in the face of evolving data.
Rakesh Agrawal, Peter J. Haas, et al.
SIGMOD 2003
Anish Das Sarma, Ander de Keijzer, et al.
Dagstuhl Seminar Proceedings 2008
Cagatay Demiralp, Peter J. Haas, et al.
VLDB 2017
Paul G. Brown, Peter J. Haas
VLDB 2003