Friday, June 27, 2008

First Line of Code

In the last two weeks I successfully:
  1. Got my development environment set up
  2. Created the model and migrations written based on the schema I've designed.
  3. Prototyped the migration script
Dev environment: I’m using Aptana Studio ( the evolved form of RadRails for Eclipse ) for an IDE. It’s come a long way since RadRails. There are a lot of IDE choices out there. I chose Aptana because most of the developers I know are using some form of Eclipse, so this gives me a chance to try out all the cool plugins they're always talking about.  I also installed the Bitnami Ruby Stack which includes MySQL.

Creat Models & Migrations: I created a rails project in Aptana and constructed the scaffold (down to migrations) for each of the tables in my schema.



Some of the scaffolds will be thrown away (join tables) but it’s useful to be able to view and manipulate data from the UI for the moment.

I made a few simple changes/notes to the schema.

  1. In the projects table start_date and projection_start_date seem to be duplicates. I need to go back and validate the meaning of each of those rows
  2. I added an application_id column to data tables such as the bug_tool_urls table. This is a table that has data populated once and it rarely (almost never) changes. Essentially it contains constants. The application_id column is meant to give me strict control over the ID that other products will use to join to this table. I’ve run into a problem where the default id column remembers the last rows ID even if I delete that row. So this makes it difficult for me to easily map to the URLs when I’m importing data.
  3. Even as I added the column I thought of a few ways to easily eliminate the need for the application id. I have a hunch it won’t be long lived.

So the migrations are written and I created a SQLite3 DB using them. I’ll switch to MYSQL when I get closer to a fully functioning system.

Data Import: I also prototyped scripts for migrating the existing text and xml based site data into the DB. I’ve created a helper class that has all of the brains of the migration. It contains POC code that:

  1. Shows how to traverse the directory structure of an existing site
  2. Read XML site properties using REXML
  3. Put those properties into a new Project model and save them to the DB.

The trick to making this work is to run the script using script/runner which spins up the rails environment for this project and gives me access to the DB through the Project model using ActiveRecord. It's a fully script-able environment with all the ease of rails based DB access!


Next steps

#1 Complete Data Import 
Everything else depends on the existence of data in the DB. If I proceed with another part of the project first I’ll need to build mock data. Given that I have loads of production data, it’s probably faster and less error prone to finish the data import and use real data when developing the other parts.


#2 Prototype the communication channel between the Central Controller and Generators.

This has three parts

  1. Integrating ActiveMQ with the existing Generator.
  2. Integrating ActiveMQ with the Rails Application
  3. Prototype messaging between the systems

Once the systems are talking I’ll real data sets to make sure message load and sizes are reasonable. I’ve already thought through this, so what I really need is the real data so I can run some realistic tests.

The Rails Administration portal GUI – While this will be fun, the scaffolds are sufficient for supporting the development of the other portions.

The data import and AMQ work will take me through to the begging of August at which time I’ll assess next steps.

Thursday, June 12, 2008

Back to Work After Moving

I'm back on the job after moving to a new town. My thesis adviser gave me 4 actions a while back. Here's the result:

1) Send Schema
http://docs.google.com/Doc?docid=dfn4hjr3_156gsj5k89b&hl=en

I posted this once before, but have since made one major update.




I'm leaning towards generating graphing data dynamically from the DB. This means I need to store the bug counts for the cross section of each bug list for each person at least once a day. See the the Table: project_person_bug_list_count section of the schema document for details.


2) Total quantity of data - # of rows
http://spreadsheets.google.com/ccc?key=pwxRLPxLbuoIXbKN4R_3Meg

The spreadsheet shows each table and the expected number of rows it will contain. I was very conservative with my estimates and I'd be stunned if the app grew this much. That said, if I was good at estimating application demand, I wouldn't be doing this project in the first place.

The key take away from the information is that all of the configuration and site rendering records for 500 projects comes to ~ half a million rows. BUT, when I add in the record of the bug counts for every person and every list (see schema doc) that adds 11 billion rows over the course of a year.

Is 11 billion rows a lot in terms of a MySQL table?

If so I'll look into a way to collapse that data. With some post processing it would be possible reduce the number of rows to one per person per bug list by creating a comma seperated string of bug counts.


3) Block or word diagram on how to aggregate data
http://docs.google.com/Doc?id=dfn4hjr3_174gq9pm9gq

This is a pretty basic explanation of what happens when you try to collect bug data.


4) Create a realistic schedule - research and deliverables
I've created a google calendar. There's not much on it yet.
My goal is (in the next two week) to:
  1. Get my development environment set up
  2. Create the model and migrations written based on the schema I've designed.
  3. prototype the migration script
One I have the migration script I'll have a real life test of mapping the existing data from the legacy system to the new system. This should uncover any major gaps I have in transitioning between the systems.

Following that I'll go after one of the following:
  1. Prototype messaging layer for communication between the central controller and the generator
  2. Research changes required to the generator
  3. Research access to NFS
  4. Development of Central Controller site configuration pages