[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: looking for opinions on storage issues



PureBytes Links

Trading Reference Links

Hello cwest,

You should keep your database raw and calculate the split data on the fly.  Just incorporate
a nested hash table where its key field is the symbol name, which
points to another hash table object containing key values representing all of its split
dates, which then points to the corresponding split factor values, and
you're set.  Hash tables are fast and will reside in memory, while
periodically serialized to disk for backup and
loading.  When a user demands a dataset, the hash tables are quite
transparent when they modify the set, which should remain in memory
until the user is done with it.  The resources will not be
overwhelmed.

Avoid all unnecessary access to disk because its slow in Windows.  I
wouldn't get involved in creating dual databases or get into a
situation where you have to rebuild your database every time you
discover that the split factor is wrong (this will always happen) for a particular instrument on
a particular date.  Just update/correct your hash table data structure
in memory, instead.  Then serialize it to disk, rather than saving
each fixed value of the data to disk, which is slow.

-F

Friday, December 26, 2003, 2:19:40 PM, you wrote:

c> This is off topic to actually trading, but there are probably a few
c> folks on the list that would have some input which would help me address
c> a design issue for a data repository (www.ufdb.com). 

c> Keeping track and applying stock-splits is error prone, which is why
c> several data vendors prefer to store data in its raw format and split
c> information separately. However, adopting this approach for an on-demand
c> historical data serving implies that before split adjusted data could be
c> retrieved, splits would have to be run against it. That's compute
c> intensive if done on the fly when there are several users retrieving
c> data almost simultaneously, and resources could be overwhelmed. 

c> Deploying a computer (servers) that address the level of capability
c> needed to make any delay in on-demand retrieval of on the fly
c> split-adjusted data transparent probably isn't an economic option given
c> the cost of software licensing-Enterprise versions of Windows Server and
c> SQL Server, per processor. (clustered 64 bit quad processors, for
c> example). So a design paradox arises-compromising error proneness and
c> split-adjustment. 

c> The solution seems to be to store historical data in both formats-split
c> adjusted as of today and raw data. However, implicitly that doubles the
c> size of a db (for stocks), which has flow on effects to local,
c> redundant, colocated and backup storage, bandwidth usage, and
c> performance if all of the tables are on the same computer. 

c> In short, if you had to make a decision about the above, what would you
c> do?

c> Thanks in advance for any suggestions
c> Colin West





-- 
Best regards,
 Frank                            mailto:r4_6fpen8@xxxxxxxxxxxxx