Thursday, November 13, 2008

Loading Static Data.

I need to have some static data in there for a few of the tables:
  1. bug_columns
  2. bug_tool_urls
  3. ownership_models
  4. site_update_frequencies
There are a lot of options for how to pre-populate data into the DB. I decided to go with YML fixtures because it's just so easy.

I love that any time I need to make a change I just type  

      rake db:fixtures:load

I'm glad to see that I included an application_id for these tables in the schema. [pat self on back] It makes it easier to populate the DB and have foreign key references to these tables without having to worry about how the DB manages the auto generated "id" column.

Saturday, October 11, 2008

Two Problems

One of the things I knew and still ignored was checking in untested code is a bad idea. Compound that by ignoring said code for a few months allowing myself to forget the context of what I had done and I have a recipe for putting this project back together like Humpty Dumpty.

Technical Background:


To do the data import I run the command script/runner scripts/import/project.rb. That's not the problem, but it's important information. project.rb does the following

thesis_importer = ThesisImporter.new("directory for the legacy data files")
thesis_importer.load_project_data
ThesisImporter is a file in app/helper that holds the logic for how to parse the existing data files, putting the data in Model Objects, and saving them to the DB.

I have two problems:


1) Logging
The code in ThesisImporter used to be in a plain old script where I created a logger that I used throughout the code
logger = Logger.new(STDOUT)
 In theory when I run the script using script/runner there should be a logger available in the environment. BUT,  as soon as the process gets to the first line with a logger...
logger.info("this thing doesn't work at all")
...I get an error saying logger is an undefined variable. I know it's been a while but I thought the rails environment set up logger for me. So why can't I use it in a helper I'm calling from a script in with script/runner?



2) Database
So far I've been using the rails embedded sqllite DB, but it was an enormous pain to see any of the data in that database, so I decided to switch to mySQL. I installed MYSQL5.0, changed the encoding of both rails and mysql to "utf8" and fired it up. I was able to create the schema in mysql by calling

script/runner db/schema.rb
Now, if if go back to running my project.rb script above (commenting out ll the logging and replacing it with "puts" then I get an error as soon as I try to save an object where Bitnami Ruby Stack complains that I'm not using a production mysql binding in rails and I need to install the C binding for mysql. My code stops working there, so I try installing the C binding.

gem install mysql

The doc insstal appears to fail, but everything else (the actual binding install) works... I restart everything... Then when I run the project.rb script again I get the error saying I'm still using the wrong binding.

OK. It just occurred to me I'm dealing with two MYSQL instances. One from Bitnami and the one I installed last night. It's possible I'm doing the gem install that is somehow specific to the bitnami MYsql instance. However, my rails app is pointing at the NEW mysql 5.0 install from last night. I'll try pointing at the bitami install tonight.



Conclusion
I should never leave broken code alone for more than a few hours. Context is more useful than all the tea in china.

I still have no idea why the logging error is happening. I need to go look through some old rails code to see if I did something special to make logger.info available.

The MySql problem could be an issue with having two mysql installations that are fighting with each other.

I can't wait to get past these environment issues and get back to the actual project. More as this develops.

Wednesday, July 23, 2008

4th of July Update

Development Environment



Over the 4th of July weekend I had a bunch of time to work on my thesis under some strange conditions.



  • In a house in upstate NY


  • Babysitting my sleeping kids

  • No internet connection

  • Forgot all my reference books

  • Listening to the Spiderwick chronicles on my ipod







Data Import



My past experience has shown me that there is no substitute for real data in development and testing, so I'm focusing on getting the data imported from the legacy system into the new rails based central controller.



The data is realitively well structured in XML and csv files, but it is organized differently than it will be in my new relational DB which means there some work to do transforming the data as I import it.



So Far I've completed the import code for:

  1. Basic project/site meta-data 

  2. Bug list definition

While listening to the Spiderwick Chronicles with no access to reference material I stubbed out the code for importing:

  1. Owner & Team Lists

  2. Bug list Projections

There's a lot of work to do building the relationships between the tables and I've captured that in  FIXME: comments in the code.



Source Code



Having deleted useful code more than once in the short time I've been working on this project I decided it was time to start using source control.  Once I got back to an internet connection I set up a repository.



I'm hosting the project on google code at http://code.google.com/p/thesisdev/ and using the subclipse plugin to handle checkins.



The one thing that bugs me is I checked in non-working code. It's not something I like doing, but I really didn't want to lose any of the Spiderwick work. 



My next job is to clean up all the pseudo code from the 4th and finish the import scripts.

Friday, June 27, 2008

First Line of Code

In the last two weeks I successfully:
  1. Got my development environment set up
  2. Created the model and migrations written based on the schema I've designed.
  3. Prototyped the migration script
Dev environment: I’m using Aptana Studio ( the evolved form of RadRails for Eclipse ) for an IDE. It’s come a long way since RadRails. There are a lot of IDE choices out there. I chose Aptana because most of the developers I know are using some form of Eclipse, so this gives me a chance to try out all the cool plugins they're always talking about.  I also installed the Bitnami Ruby Stack which includes MySQL.

Creat Models & Migrations: I created a rails project in Aptana and constructed the scaffold (down to migrations) for each of the tables in my schema.



Some of the scaffolds will be thrown away (join tables) but it’s useful to be able to view and manipulate data from the UI for the moment.

I made a few simple changes/notes to the schema.

  1. In the projects table start_date and projection_start_date seem to be duplicates. I need to go back and validate the meaning of each of those rows
  2. I added an application_id column to data tables such as the bug_tool_urls table. This is a table that has data populated once and it rarely (almost never) changes. Essentially it contains constants. The application_id column is meant to give me strict control over the ID that other products will use to join to this table. I’ve run into a problem where the default id column remembers the last rows ID even if I delete that row. So this makes it difficult for me to easily map to the URLs when I’m importing data.
  3. Even as I added the column I thought of a few ways to easily eliminate the need for the application id. I have a hunch it won’t be long lived.

So the migrations are written and I created a SQLite3 DB using them. I’ll switch to MYSQL when I get closer to a fully functioning system.

Data Import: I also prototyped scripts for migrating the existing text and xml based site data into the DB. I’ve created a helper class that has all of the brains of the migration. It contains POC code that:

  1. Shows how to traverse the directory structure of an existing site
  2. Read XML site properties using REXML
  3. Put those properties into a new Project model and save them to the DB.

The trick to making this work is to run the script using script/runner which spins up the rails environment for this project and gives me access to the DB through the Project model using ActiveRecord. It's a fully script-able environment with all the ease of rails based DB access!


Next steps

#1 Complete Data Import 
Everything else depends on the existence of data in the DB. If I proceed with another part of the project first I’ll need to build mock data. Given that I have loads of production data, it’s probably faster and less error prone to finish the data import and use real data when developing the other parts.


#2 Prototype the communication channel between the Central Controller and Generators.

This has three parts

  1. Integrating ActiveMQ with the existing Generator.
  2. Integrating ActiveMQ with the Rails Application
  3. Prototype messaging between the systems

Once the systems are talking I’ll real data sets to make sure message load and sizes are reasonable. I’ve already thought through this, so what I really need is the real data so I can run some realistic tests.

The Rails Administration portal GUI – While this will be fun, the scaffolds are sufficient for supporting the development of the other portions.

The data import and AMQ work will take me through to the begging of August at which time I’ll assess next steps.

Thursday, June 12, 2008

Back to Work After Moving

I'm back on the job after moving to a new town. My thesis adviser gave me 4 actions a while back. Here's the result:

1) Send Schema
http://docs.google.com/Doc?docid=dfn4hjr3_156gsj5k89b&hl=en

I posted this once before, but have since made one major update.




I'm leaning towards generating graphing data dynamically from the DB. This means I need to store the bug counts for the cross section of each bug list for each person at least once a day. See the the Table: project_person_bug_list_count section of the schema document for details.


2) Total quantity of data - # of rows
http://spreadsheets.google.com/ccc?key=pwxRLPxLbuoIXbKN4R_3Meg

The spreadsheet shows each table and the expected number of rows it will contain. I was very conservative with my estimates and I'd be stunned if the app grew this much. That said, if I was good at estimating application demand, I wouldn't be doing this project in the first place.

The key take away from the information is that all of the configuration and site rendering records for 500 projects comes to ~ half a million rows. BUT, when I add in the record of the bug counts for every person and every list (see schema doc) that adds 11 billion rows over the course of a year.

Is 11 billion rows a lot in terms of a MySQL table?

If so I'll look into a way to collapse that data. With some post processing it would be possible reduce the number of rows to one per person per bug list by creating a comma seperated string of bug counts.


3) Block or word diagram on how to aggregate data
http://docs.google.com/Doc?id=dfn4hjr3_174gq9pm9gq

This is a pretty basic explanation of what happens when you try to collect bug data.


4) Create a realistic schedule - research and deliverables
I've created a google calendar. There's not much on it yet.
My goal is (in the next two week) to:
  1. Get my development environment set up
  2. Create the model and migrations written based on the schema I've designed.
  3. prototype the migration script
One I have the migration script I'll have a real life test of mapping the existing data from the legacy system to the new system. This should uncover any major gaps I have in transitioning between the systems.

Following that I'll go after one of the following:
  1. Prototype messaging layer for communication between the central controller and the generator
  2. Research changes required to the generator
  3. Research access to NFS
  4. Development of Central Controller site configuration pages

Friday, May 2, 2008

DB Schema - First Draft

I've completed the first draft of the schema. These are my notes on the mappings form old XML based fields to the new relational model. I only covered the parts that did not map 1 to 1.

This is a picture of the first draft schema. Click the image to see a larger (readable) version.

Thursday, May 1, 2008

Messaging and Active MQ

In the proposal I eluded to using web services to communicate between the central controller and the generators, but I think I'm going to use Active MQ instead.

I spoke with Ryan yesterday and he said the team is using ActiveMQ for messaging between processes and other apps in their group and it would be trivial for me to use their production bus.

AMQ appears to address all of the requirements for communication for my project and leaves a lot of room to grow new functionality in the future.

The standard team message queues are set up as follows:

Standard base queue path
/queue/(env)[production||development]/(app)[application name]/

Postfix non standard queue path
(standard)/(action)/(arguments)


to get this working in ruby I need to do the following:
Ruby --> `gem install stomp`
Perl --> Net::Stomp

Some things to look out for with ActiveMQ

Bug #1
There's a bug in ActiveMQ that makes all messages into binary messages if the header size is set. The fix for this is to:

Comment out the `content-length` header in `def transmit` in

/usr/local/lib/ruby/gems/1.8/gems/stomp-1.0.5/lib/stomp.rb

Bug #2
There's a bug in serialize to YML with arrays so I'm going to use serialize to XML

Officially Starting

My thesis proposal has been accepted and I'm registering for my last class, which is basically my thesis class.

That gives me a maximum of 9 months or until February 12th, 2009 to finish my thesis and get a grade.

Hello World

Looks like everything is working.