|
From: | Marcus G. Daniels |
Subject: | Re: [Swarm-Support] New code: Interpolator, time series input manager |
Date: | Sat, 17 May 2003 10:56:05 -0600 |
User-agent: | Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.4b) Gecko/20030516 |
Paul E. Johnson wrote:
Swarm can write and read hdf5 files. Did you ever experiment with that as a way to manage time series data? I have succeeded with the output of hdf5, but not the reading-in part. But I did not try with any vigor at all.
I recently wrote some HDF5-based tools for our finance data analysis group at SFI. I'm pleased with the performance. We are studying a dataset of 350 million events over four years cross linked across five databases for hundreds of stocks. Compared to text files, HDF5 provides random access to any event or timestamp, compression (a single stock can be 1.2 GB for 10 million cross-linked records), and portability across platforms (people in the group use Linux, Solaris, Windows, and MacOS X). Alternative approaches can optimize for speed or space, but HDF5 lets you tune the tradeoff. Parallel to the data analysis, the modelers in our group are tweaking agent-based models that also read and write big data streams. (The simulations read datastreams in order to place agents in a realistic trading environment. Writing HDF5 is for comparision with real datasets.)
[Prev in Thread] | Current Thread | [Next in Thread] |