Scaling properties of common statistical operators for gridded datasets

TitleScaling properties of common statistical operators for gridded datasets
Publication TypeJournal Article
Year of Publication2007
AuthorsZender, C. S., & Mangalarn H.
JournalInternational Journal of High Performance Computing Applications
Volume21
Pagination485-498
Date Published12/2007
Type of ArticleProceedings Paper
ISBN Number1094-3420
Accession Numberhttp://apps.isiknowledge.com/InboundService.do?Func=Frame&product=WOS&action=retrieve&SrcApp=EndNote&Init=Yes&SrcAuth=ResearchSoft&mode=FullRecord&UT=000250718500009
Keywordsanalysis; climate system model; computational model; data; data access; geoscience; interface; netCDF; performance; scaling; Zender Modeling Lab
Abstract

An accurate cost model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysis costs for arithmetic operations on gridded datasets typical of satellite- or climate model-origin. For these dataset geometries our model predicts data reduction scalings that agree with measurements of widely used geoscience data processing software, the netCDF Operators (NCO). I/O performance and library design dominate throughput for simple analysis (e.g. dataset differencing). Dataset structure can reduce analysis throughput ten-fold relative to same-sized unstructured datasets. We demonstrate algorithmic optimizations which substantially increase throughput for more complex, arithmetic-dominated analysis such as weighted-averaging of multi-dimensional data. These scaling properties can help to estimate costs of distribution strategies for data reduction in cluster and grid environments.

URLpub/730
Alternate JournalInt. J. High Perform. Comput. Appl.
ESS Associations
Research Area: 
Atmospheric Chemistry
Research Area: 
Physical Climate
Research Lab: 
Zender Research Group