Wednesday, January 14, 2009

Too many rows for history!

In the legacy system " Histories" are used to store the bug count for the cross section of each owner, buglist and project each time the site is generated. This information is used in historical reporting to show how a person's, team's, or project's bug count for a given bug list has changed over time.
I sat down to write the DB import code for bug count histories tonight and noticed I didn't have a table for it. Suspicious...
So I looked at my old notes and discovered a calcuation I did a while back to figure out the number of rows I'd need in my DB.
The Math
Assuming 100 active sites/projects

(active project_people ~ 30,000) * (active bug lists ~ 1000) * (1 year of active data ~365 days) == 10,950,000,000 (Time to call Carl Sagan)

I did a few quick searches and found answers suggesting there was no theoretical limit to DB size as long as you have a nudlear powered CPU, infinate memory and a heat sink liquid cooled by the nothern Atlantic ocean.
BUT... They all talk about things in terms of millions of rows and keeping individual tables under 500 million rows to keep from encouraging Loki (god of fire and trickery) from laying waste to their application.
Additional review of my calculations shows:



Descriptionrow count% of DB
Sum of all rows except bug counts:428,0580.003909%
Sum of History Rows for 1 year:10,950,000,00099.9960909%

The purpose of the Central Controller was to have a central point for site management. Not to act as a historical data repository that will someday rival amazon.com's.

OK that's an exaggeration, but it occurs to me that if I store histories in the DB I'm leaving a ticking time bomb for whoever takes over the app. First it the central controller will start to slow down and about a 1.5 years after we deploy it will either crash the server or heat it up so hot that it will melt a hole through the outer mantle of the planet.
Either way, I'm not doing anyone any favors writing this import code.
The Solution

Given:
  • Histories have only one purpose, graphing historical bug counts.
  • I'm planning on replacing the graphing tools using a javascript based graphing tool
  • That graphing tool is going to want historical data in Javascript
I'm going to:
Not import histories into the DB. Instead I'll store the historical data in JSON as flat files on the web server with the other generated site content.
To do that I need to:
  1. Figure out the best Object Notation for the graphing tool I'm going to use.
  2. Write a one time script that writes converts the existing historical data into JSON files
  3. Teach the generator how to read and re-render those files each time it runs.
So I'm moving the histories import mechanism into the next phase when I work on the generator.

No comments: