Thursday, June 12, 2008

Back to Work After Moving

I'm back on the job after moving to a new town. My thesis adviser gave me 4 actions a while back. Here's the result:

1) Send Schema
http://docs.google.com/Doc?docid=dfn4hjr3_156gsj5k89b&hl=en

I posted this once before, but have since made one major update.




I'm leaning towards generating graphing data dynamically from the DB. This means I need to store the bug counts for the cross section of each bug list for each person at least once a day. See the the Table: project_person_bug_list_count section of the schema document for details.


2) Total quantity of data - # of rows
http://spreadsheets.google.com/ccc?key=pwxRLPxLbuoIXbKN4R_3Meg

The spreadsheet shows each table and the expected number of rows it will contain. I was very conservative with my estimates and I'd be stunned if the app grew this much. That said, if I was good at estimating application demand, I wouldn't be doing this project in the first place.

The key take away from the information is that all of the configuration and site rendering records for 500 projects comes to ~ half a million rows. BUT, when I add in the record of the bug counts for every person and every list (see schema doc) that adds 11 billion rows over the course of a year.

Is 11 billion rows a lot in terms of a MySQL table?

If so I'll look into a way to collapse that data. With some post processing it would be possible reduce the number of rows to one per person per bug list by creating a comma seperated string of bug counts.


3) Block or word diagram on how to aggregate data
http://docs.google.com/Doc?id=dfn4hjr3_174gq9pm9gq

This is a pretty basic explanation of what happens when you try to collect bug data.


4) Create a realistic schedule - research and deliverables
I've created a google calendar. There's not much on it yet.
My goal is (in the next two week) to:
  1. Get my development environment set up
  2. Create the model and migrations written based on the schema I've designed.
  3. prototype the migration script
One I have the migration script I'll have a real life test of mapping the existing data from the legacy system to the new system. This should uncover any major gaps I have in transitioning between the systems.

Following that I'll go after one of the following:
  1. Prototype messaging layer for communication between the central controller and the generator
  2. Research changes required to the generator
  3. Research access to NFS
  4. Development of Central Controller site configuration pages

No comments: