Thesis Development: 2009

Thursday, June 4, 2009

Thesis Document

The final version of my thesis document was approved on April 28th. A PDF version is available here.

Per the requirements of the thesis course I sent it off to the binders who were kind enough to turn it into an actual book for just under $100. The final version of the book was approved by my thesis director on May 11, 2009 at which point I took up sleeping again. The book is now sitting on a shelf in the Grossman Library in Harvard Yard.

View Larger Map

I haven't actually seen the bound copy since it was sent directly from the binders to the school. I think I'll stop by the library when I go to pick up my diploma. I've been thinking about taking Alex (my 5 year old son) and making a day of it.

Tuesday, April 28, 2009

Advanced Page Numbering in Microsoft Word

For the past few weeks I've been responding to review comments on my thesis document. it boils down to a lot of tweaking of the text, and proofreading until I wish I never heard of my thesis project. But it was all pretty straight forward until tonight.

Today, in what I hope was the last review comment, I was told that the page numbering in my 262 page document was not correct. My document starts with the first page numbered 1 and proceeds to increment up to page 262.

But the thesis document requirement is:

The first three pages of the document should have no page numbers
starting on the 4th page pages should be numbered in roman numerals starting with the number iv. (so basically the first three pages were i, ii, and iii, but those numbers are supposed to be hidden.)
Starting with the first page of Chapter 1, the pages should be numbered with incrementing Arabic numbers. 1, 2, 3...

If you, like me, are a casual Microsoft Word user you will find this task infuriating. As did the author of the Harvard Extended blog in this post. He came up with an answer, but it turns out there was a much simpler way to meet the above requirements.

The Solution

Read How to control the page numbering in a Word document
Find some way to thank Bill Coan for identifying this preposterous use case and writing such a useful article about how to address it.
Create a new "Section" at the end of each of the first three pages using the menu Insert >> Break >> Section Break. Section breaks replace page breaks in these locations and will let you number the pages per the requirements above.
Create a "Section" just before the Chapter 1 chapter heading using the same method as in step 3.
Put the courser on the first page and add page numbers using the menu INSERT >> Page Numbers Be sure to un-check "Show number on the first page" just like I did the picture tot he right. Since this page is it's own section no page numbers will appear.
Repeat this step for page 2 and 3 if there are page numbers showing. Those sections, may inherit the settings from page one. In that case you may not need to do anything.
Put the courser on page 4 add page numbers again, but this time (per the Page Number Format image):

CHECK the "Show number on first page
Choose Alignment "Center"
Click Format...
Choose Number Format "i, ii, iii..."
Choose Start At "iv"

Put the courser on the first page of Chapter 1 and add page numbers again, but this time:

CHECK the "Show number on first page
Choose Alignment "Center"
Click Format...
Choose Number Format "1, 2, 3..."
Choose Start At "1"

That's it. You should now have a document that follows the conventions for the Harvard ALM in IT Masters Thesis document.

Acknowledgments

I'd like to thank Ian Lamont, the author of Harvard Extended. I wish I had found his blog more than two weeks before I was about to finish school. And of course thanks to Google for pointing Bill Coan who wrote this obscure but brilliant article on MS Word page numbering.

Tuesday, March 31, 2009

Zero Hour

I turned my 89 page thesis document in to my thesis adviser and thesis director at 11:15pm March 31st 2009. 45 minutes before it was due.

I expect them to send back comments and change requests. There will probably be a few more revisions before the thesis is accepted.

There is one outstanding task: The code needs to be included in the final doc, so I need to write a script to read through the rails code and convert it into the format they expect in the doc.

I should be bouncing off the walls, but at the moment I'm too tired to be excited. Allison seems to have a horrible stomach bug, so I'm off be be Dad an check on her before I pass out.

Friday, March 20, 2009

Thesis Document First Draft

The first draft of the thesis document is complete! It came in at a paltry 50 pages.

Fortunately I'll be able to pad the page count with all the code I'm required to put in the appendix. Rendered as a PDF it only takes up 1.4mb of space. It's almost as if I didn't have anything to say.

Since the chances of my getting it just right the first time are next to nill, I'm expecting a pile of comments and recommendations from my thesis adviser. I'll have to act quickly to make any changes he requests before I turn it in on April 1st.

In the mean time, this is what Google Docs thinks of the thesis document.

Saturday, March 7, 2009

A Very Beautiful Graph

Of all the parts of my project, graphing has the most moving parts. It requires the generators to work, the controller to respond quickly and for a bunch of JavaScript to combine lots of data from both the generator and the controller and format that data into a Google chart URL.

This is an actual graph, of actual projections, and actual bug counts. It was served up by the controller after I clicked on a graph link from a generated project summary page. It's using completely real data from the controller DB and the generators history.js files generated at the same time as the enotify project page.

In other words. It works!

More importantly, there are NO caveats!

This is production data. I didn't hold it's hand or even watch. I just came back after it the generator did it's thing, went to the web page and TA-DAAAAA!

Running the Real Thing

I just launched a longevity test of the system using current production data. To this point all of the runs used data that was over 6 months old. While that data is valid, there are features of the graphing system that don't get exercised.

In order to get a full real life view of the graph functionality you need projections that span at least 1 week before and after the current date. While the graphs were tested with test projections that met that requirement, the 6 month old data didn't provide a real life view of a fully exercised graph.

So basically, I've tested this thing every way I can, and the only thing left to do is to import production data into the controller and let the generators do their thing.

If all goes as expected I'll be able to demo the running system with production data and extremely cool, live updating graphs.

Late night thoughts and Connecting to MySql on a Remote Machine

I'm sitting here in the dark loading lots of production data into the project so I can get some realistic looking graphs for the demo of the project.

Two things occur to me at this late hour:

This project while ridiculously large in scope, was a good idea. Even as I run the import script data on different sites, I'm discovering that as structured as the data is, there are still differences between the data formats and types on the legacy server. A central db and controller is long overdue.
I should have written down the instructions for enabling a remote mysql client/user to connect to a mysql instance. Here they are for the next time:

Users need permission to connect to mysql on a remote linux machine. If they don't have it, trying to log in with MYSQL Admin or MYSQL Query Analyzer will fail with a 1045 error.

To fix this log into the shell on the mysql server and type this:

make sure you have granted access to your user: (run command line)
mysql> GRANT ALL ON databaseName.* TO 'your_mysql_name'@'your_client_host_or_ip';

I found these instructions @ http://forums.mysql.com/read.php?35,9919,12139#msg-12139

Wednesday, March 4, 2009

Starting the Thesis Document

My intention is to reuse most of my thesis proposal document in my final thesis document. It is yet unclear if that is an option and I'm not totally clear on the best way to frame this project for the thesis document.

My understanding to date is that this document is intended to describe the problem, the solution, the value of the solution and any inovative and/or cool thinkgs I came up with or learned during the project.

My first step is to create a thesis outline and my thesis advisor has suggested writing 3documents to help me frame what I'm talking about.

The top 5 benefits to the student - inovations or challenges overcome in the project
The top 5 benefits to the end users
A new 1 page summary of the project.

The first two documents are completed. I started with them because they were the most straight forward and I'm struggling to find a way to describe this project that differs from my original thesis proposal.

My goal is to have the outline and 1 page summary completed by tomorrow.

Development Complete!

Thesis Development completed last night at ~10:30pm. I was too tired and relieved to write about it at the time. The code has been deployed on VMs and is currenly running with copies of production projects (a.k.a sites).

I except to make some minimal changes as I test the project over the next two weeks. Most of them will be CSS related.

The focus of the thesis project now shifts to writing the thesis document. The entire project and thesis document is due April 1st. That is not a joke.

Wednesday, February 25, 2009

Success!

Tonight it was time to take the controller and generators for a spin and see if this thing was going to work in real life. Once the install was complete:

I fired up the controller...

ruby /usr/local/enotify/controller/script/server start

And kicked off a generator on a separate machine..

cd /usr/local/enotify/generator perl ./bin/enotify.pl

The generator called out to the Controller and asked for a site to generate, the Controller responded with a site. The generator generated the site, checking in and updating the controller on it's progress.

When it finished the site was available through the controller web server!

http://enotify-controller/sites/faq/output/website/index.html

SUCCESS!!!

The hard part of the development is DONE!!! The dev tasks that remain are:

adding the controller scheduler logic (basically translating this from the legacy system)
adding the replacement graphing code (all JS based and prototyped)
Admin UI CSS for cleaning up the look and feel
ldap integration for creating owner lists.
testing

Install Instructions

I installed the project on virtual machines today.

The machines used

machine name	description
enotify-controller	This is the controller! rails administration installed here
enotify-gen-01	generated site storage and generator code stored here and shared via nfs. generator 1 chron task runs here
enotify-gen-02	mounts enotify-gen-01/generator & sites and has a chron task that runs the generator from that machine.

Once the controller and first generator are set up with nfs all future generators need nothing but to mount the generator code in gen-01 and run the generator. The controller takes care of the rest!!! So bringing new generators online means nothing more than finding a linux machine, mounting the gen-01 generator nfs mount and calling

/urs/local/enotify/generator/bin/enotify.pl --remote --gen --update

I've updated enotify.pl to use a new package called Enotify::Generator::ControllerCommunicator described here.

So the change to the legacy generator was literally the addition of about 15 lines. A new option --remote to tell the generator to call the controller, and the calls to ControllerCommunicator

The following are the install instructions for this setup. There's no reason to read this unless you're trying to reproduce it. I took notes while I set up the controller and generators. Here's exactly what I did to install them:

The Generator
Copy a legacy generator (existing code and libraries) to enotify-gen-01:/usr/local/enotify/generator such that you end up with /usr/local/enotify/generator/bin/enotify.pl

The Controller
Installing RUBY on the controller

wget ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6.tar.gz
tar -xzvf ruby-1.8.6.tar.gz
cd ruby-1.8.6
./configure
make
make install

ruby is installed to /usr/local/bin/ruby
* don't believe the rubyonrail.org website, there are bugs between rails 2.0.2 and ruby 1.8.7. You need 1.8.6 for this to work.

INSTALLING RUBY GEMS
wget http://rubyforge.org/frs/download.php/45905/rubygems-1.3.1.tgz
tar -xzvf rubygems-1.3.1.tgz
cd rubygems-1.3.1
chmod 755 setup.rb
ruby setup.rb

INSTALLING RAILS
gem install rails -v=2.0.2

# mongrel didn't get installed, so I installed it
sudo gem install mongrel

INSTALLING THE CONTROLLER RAILS APP

#copy the controller code over to enotify-controller into /urs/local/enotify/controller

#copy the legacy site directories you want to import into
mkdir /usr/local/enotify/site_to_import

INSTALLING MYSQL
# the server
wget http://dev.mysql.com/get/Downloads/MySQL-5.1/MySQL-server-community-5.1.31-0.rhel4.i386.rpm/from/http://mysql.mirrors.pair.com/

rpm -ivh MySQL-server-community-5.1.31-0.rhel4.i386.rpm

# the client
wget http://dev.mysql.com/get/Downloads/MySQL-5.1/MySQL-client-community-5.1.31-0.rhel4.i386.rpm/from/http://mysql.mirrors.pair.com/

rpm -ivh MySQL-client-community-5.1.31-0.rhel4.i386.rpm

#In order to get a remote MYSQL Administrator to be able to connect
mysql -u root mysql
grant all privileges on *.* to 'root'@'';

# now you can connect as the root user from your pc

Create a new DB User match username and password to rails/controller/config/database.yml
right click on the user and set the user to be able to connect from localhost

CREATE THE RAILS DBs
Under Catalogs Choose "Create New Schema"
Create a new schema for

thesisedev_development
thesisedev_test
thesisedev_production

# Give the new user privileges
Under users click on the new user and choose the schema privileges tab.
For each thesis_dev schema add all privileges for this user.

NFS Shares and Mounts

SETUP THE NFS SHARES
On enotify-gen-01:/etc/exports add
/usr/local/enotify/generator enotify*.flubber.com(rw,no_root_squash)

SETUP THE NFS MOUNT
On the controller and every other generator add this to /etc/fstab
enotify-gen-01:/usr/local/enotify/generator /usr/local/enotify/generat
or nfs defaults 0 0

CONTROLLER DIRECTORIES
/usr/local/enotify/controller = rails app
/usr/local/enotify/site_to_import = sites that will be imported into the
controller when you run
controller/script/import/p

DATA STORE DIRECTORIES
/usr/local/enotify/generator = enotify generator code and sites.

the sites directory is [insert_your_ip_address_here]/usr/local/enotify/generator/sites because this is where the legacy generators expect to find it.

Making generated sites available from the controller
make a symbolic link that puts the sites directory under the controller public director. This lets rails serve up the static generated files.

ln -s /usr/local/enotify/controller/public/sites /usr/local/enotify/generator/sites

DB Schema and Legacy Site Import

#Set up the schema:
rake db:migrate
# migrate in the new data
./p

# start the server
ruby script/server start

Tuesday, February 24, 2009

Generator Controller Communication

In my last post I mentioned that I took ActiveMQ out of the project and settled on web services for communicating between the Generators an the controller, but I left out how the communication would work.

I settled on a very simple model. Generators are responsible for asking for work and attempting to update the controller with generation status. The Controller is responsible for doling out work, tracking progress, error management and and dealing with rogue generators.

Rogue generators aren't likely given their stability over the last 5 years, but if they were to go bad they would do things like:

Ask for a site to generate. Not complete generating the site. Ask for another site to generate.
Fail to report a status update for generating a site.

In these cases, the controller has the error checking capabilities to clean up the generation data as much as possible and either tell the generator to do nothing, re-render a site, or render a new site.

The Perl Doc for the module used by the generator to communicate with the controller is probably the best way to understand how this works.

=head1 Enotify::Generator::ControllerCommunicator

  This package enables communication with the enotify controller 

  web application.

  The functionality in this package supports the communication model

  where the generator ALWAYS initiates contact with the Controller.

  The process flow is as follows

  1) a cron task kicks off the generator

  2) the generator asks the controller what work to do

  3) the controller responds with the name of a project

  4) the generator generates the site & sends status messages

     to the controller during the update process

  5) the generator finishes the site update, notifies the controller and exits

the cron tab is on a 2 minute retry, with logic to keep multiple generators from starting. so step 1 will happen at most 2 minutes after step 5 completes

  What does the controller do with the status updates.

    The controller uses these status updates to record the progress of the generator in the Generation object. Each status message corresponds to a column in the generation table. The Controller records the current time in that columns when it recieves a status

    message.

    Status updates are simply messages saying the generator is starting or ending some part of the site generation. For example:

    GENERATION_BUG_LIST_DOWNLOAD_START means the generator is about to start downloading

    the sites bug lists and GENERATION_BUG_LIST_DOWNLOAD_END means the generator finished

    downloading those bug lists.

    Complete list of status messages

        GENERATION_START

        GENERATION_END

        GENERATION_BUG_LIST_DOWNLOAD_START

        GENERATION_BUG_LIST_DOWNLOAD_END

        GENERATION_RENDERING_PAGES_START

        GENERATION_RENDERING_PAGES_END

        GENERATION_RENDERING_GRAPHS_START

        GENERATION_RENDERING_GRAPHS_END 

   The controller identifies all requests based on the IP address of the 

   generator which is automatically passed with every call. 

   Deployment Requirement: 

     Because it is technally possible for a DHCP generator to change it's IP 

     address while running a site, the enterprise deployment of this mechanism 

     requires generators with Static IP addresses.

   Example use:

    $controller_comunicator = Enotify::Generator::ControllerCommunicator::new();

    $site_name = $controller_comunicator->get_site_to_generate()

    print "Site Name " . $site_name;

    >Site Name foo

    $controller_comunicator->

=cut

=head2 ControllerCommunicator->get_site_to_generate() {{{

    Purpose: This method requests a new site to generate from the

              controller.

    Expects: Nothing. Seriously, it is the controllers responcibility

             to deal with generators that go crazy or ask for this 

             info multiple times.

    Returns: (STRING) The short name of the site that is being updated. This

             is also the last directory in the path to the 

             directory for the site.

             print "enotify/sites/" . $controller_comunicator->get_site_to_generate()

             > enotif/sites/foo

=cut

=head2 ControllerCommunicator->update_generation_progress() {{{

    Purpose: This method sends a status update to the controller.

                Valid Status messages:

                        GENERATION_START
                        GENERATION_END

                        GENERATION_BUG_LIST_DOWNLOAD_START
                        GENERATION_BUG_LIST_DOWNLOAD_END

                        GENERATION_RENDERING_PAGES_START
                        GENERATION_RENDERING_PAGES_END

                        GENERATION_RENDERING_GRAPHS_START
                        GENERATION_RENDERING_GRAPHS_END


    Expects: The name of the update to send.

    Returns: (BOOLEAN) True if the update succeeded. False if there was a server error.

             print $controller_comunicator->get_site_to_generate(GENERATION_START)

             > 1
=cut

Saturday, February 21, 2009

Development Status Update

The blog has been quiet because I'm going flat out on development with a goal of finishing on March 1st.

The following is a quick brain dump of my progress. No promises on spelling here. It's 1:30am on a Friday after a particularly long week.

Active MQ is Gone:
I've eliminated ActiveMQ and replaced it with basic web service calls from the generators. REST wasn't in rails the last time I developed a full app. I like what they've done with the 2.0. It has a decidedly CRUD feel all the way through.

The switch from AMQ to REST greatly simplified communications mechanism and got around a limitation in the way the rails plug-in for AMQ handles, or rather doesn't handle temporary queues.

I'll post the communication message call flow as soon as I'm done testing it (see generator perl below).

UI Update / Schema

Most of the admin UI is built. I used Active Scaffold. It took a while, but I figured out how to organize the UI work flow so that it worked around Active Scaffold limitations.

This did require a change to the schema, but it's a really good change. I was previously keeping team information in the owners table.

The Owners table had the columns project_id, person_id, manager_id. The applicaiton logic stated that if person_id == manager_id then that person was the manager of the team.

I found myself writing a lot of code to deal with this logical relationship and when I tried to make it work with active scaffold it really started to hurt. So I changed the schema.

Now there is a Team table:

class CreateTeams < ActiveRecord::Migration
def self.up
    create_table :teams do |t|
      t.integer :project_id
      t.integer :person_id (this is the manager)

      t.timestamps
    end
end

And the Owners Table now looks like this:
class CreateOwners < ActiveRecord::Migration
def self.up
    create_table :owners do |t|
      t.integer :team_id
      t.integer :person_id

      t.timestamps
    end
end

I deleted a tun of code and acivescaffold just works!

Convetion over configuration every time!!!

Giving Active Scaffold a Push - Dynamic action_links
For the sake of UI consistency I did make one hack on top of Active Scaffold. I needed a link at the top of the table that had a dynamic value in it. On the teams table I wanted a link to add a new team and I needed to pass the current project ID. That requires a fair rewrite of the way active scaffold passes parameters when creating action links.

Instead of doing it the hard way I just replaced the links by modifying the resulting pages DOM with some JavaScript. http://code.google.com/p/thesisdev/source/browse/trunk/controller/app/views/teams/for_project.html.erb

CGI::Session::CookieStore::CookieOverflow
Every project has a bug that just scares the crap out of me. Getting 500 server errors because of a Cookie overflow was that bug for me. It happened at random on certain pages. It turned out to be the way I was using Active Scaffold.

I was rendering with this line

:active_scaffold => 'teams',
:constraints => {:project_id => @project}

Apparently Active Scaffold stores constrains in the session object. So after refreshing a few times and the leaving and returing to this page BOOOOOMM!!! All that project info was stored in the cookie and it overflowed its 4k limit.

The fix was simple. Just change @project to @project.id so the new code is

:active_scaffold => 'teams',
:constraints => {:project_id => @project.id}

And that get's rid of the cookie overflows.

Active Scaffold Sortable is a lie!

There's this amazing screencast showing a rails plugin that allows you to drag and drop records to reorder them. When I install the plugin I can't even get the server to start. I've tried every permutation of rails/active_scaffold version I can think of, posted emails on forums and even emailed the developer. Result = silence.

Based on the other forum posts it appears the code worked for a brief window when stars aligned and no one touched the active_scaffold code base. That time passed, and no one tagged the source tree, so it is lost forever. Sadly I'm giving up on sortable, and will have to figure out some other way to order bug lists and table column headers.

Generator Perl:
The perl package that enables the generator to talk to the controller via REST is written. Testing with the controller commences tomorrow.

I still have to integrate this with the generator, but I had a look through the generator logic and it's all neatly contained in a single well documented script. [Thank you me from 5 years ago!] Once the controller is tested I'll be able to drop the communication package in, and add/edit a few simple lines, to make the controller the master of the generators workload.

Deployment:
Three Red Hat Linux Virtual Machines should be available early next week so I can test the system in a truly distributed fashion.

What's Left in Development:

Error checking logic for adding teams, owners and administrators
Changing the generator to listen to the controller
Controller / generator integration
Controller UI for generators and generations
Cloning Projects, Bug_Lists and Teams
Lots of testing
Deployment to the test bed

Once it's deployed I'm going to start writing my paper. The applicaion will have 3 weeks of real life soak time while I'm writing.

While I don't plan to deploy it in production before I finish the thesis, it should be very obvious if there are any problems with the system.

Monday, February 9, 2009

Active MQ and the mysterious XTools

Tonight was a night of minor frustrations. Working on my thesis was postponed by my daughter who decided bed time should be moved to 9:30pm. She's very cute, but I can't work on my thesis when she's around because she insists on my reading her books. If I try to develop while reading to her I end of with variable names like Bert and Ernie.

Once she went to sleep:

I installed ActiveMQ version 5.2

download
untar
cd [AMQ install director]/bin
chmod 755 activemq
cd ..
bin/activemq

And it just works.... It even has a web admin running at http://localhost:8161/admin/

Then I tried to get a simple perl script to connect via Net::Stomp.

I tried to install Net::Stomp from using sudo -H cpan -i Net::Stomp and it failed because make was missing from my machine.

For the next hour my brain didn't function as I search google for a way to install make on OS X. The search terms "install make on OS X" are not helpful...

Eventually I found and was simultaneously directed to Apples XTools. Duh... I should have looked on apple's site first.

I've now downloaded the svelte 1GB XTools installer, but I'm to tired to actually run it. That's a task for tomorrow.

Saturday, February 7, 2009

22 Days of Development Left

My thesis is due April 1st. That means the running code and thesis paper which is likely to run about 75 pages have to be done.

I'm giving myself till March 1st to finish the code...

And then it's on to the paper...

Lots of progress

Since the last time I wrote about actual feature development a lot has happened.

application_id is no more
Per my previous post on application_ids I decided to get rid of them. They are now gone. That means updates to

migrations (removed all the application_id
model relationships (has_many and belongs_to are not straight forward)
the legacy importer now looks up IDs and assigns them instead of using the application_id

Fixtures
I'd been using test/fixtures via rake db:fixtures:load to import some baseline data and constants into the dev and production DB. Once I started writing tests this didn't make sense anymore because the test fixtures were clobbering the production DB.

So I created a db/fixtures directory and told the migrations to load those fixtures. Now the baseline data is imported into the DB when I run rake db:migrate:reset

Communication with the Generator
As I thought though the communication from the controller to the generator I realized that I was going to have to rewrite the generator property loaders in order to read whatever format I wrote the site properties in from the controller.

Dreading that task I had a thought. Why not just have the controller render the site properties and other generator config data in the exact format the generator was expecting. So I did.

Taking it one step further I realized there was no reason to send this info over ActiveMQ. Since the controller and all the generators are connected to the same file share, I simply told the controller where the site directory was and then had it write the property files needed by the generator.

Arguably, this is going full circle. I just did all this work to import all that information out of the legacy files and into the DB, and now I'm taking it out of the DB and putting it back into the files!

But the benefit is a rapid development and test environment. It's trivial to bring the existing generator up and running since the site directory looks EXACTLY the way the generator expects. This allows me to focus on the main goal of the project. Centralizing control of the application and generation scheduling in the controller.

At this point the controller knows how to:

Create a site directory from scratch
publish site_properties.xml
publish queries.xml
publish the owner lists

The upshot of this dev work is it forced me to build all the relationships between most of the Models in order to be able to generate the content of those files.

Skipping Projections
I did NOT publish projections yet. Graph generation is one of the few parts of the generate I'm changing. I'll likely dynamically generate projections from the controller so they are always up to date. I'm leaving projections until I tackel the graphing part.

thesis.next
It's time to get ActiveMQ up and running for communications between the controller and generators.

Textmate +Mac = no time to write

Using Textmate, I find myself with no down time when I develop. With Apenta and Ruby on Windows I frequently found myself with entire minutes where my IDE or scripts were unresponsive... During those times I made updates to this blog.

With my new environment MAC + TextMate, there is no down time. As a result I have a big backlog of updates. Topics include:

Changes to the communication workflow
Rendering generator understandable XML from the controller
Unit Testing
Separating test and production migration fixtures
Setting up the table relationships in the models
Deployment environment progress update

Details coming soon. Right now TextMate is calling...

Sunday, February 1, 2009

Switching to a Mac

I've been developing with RoR -v=2.02 on my winxp laptop and I'm running a scripts very frequently.

rake db:migrate:reset and ruby script/runner script/import/project.rb and most importantly rake test:units.

Running these commands is slow.

Painfully slow.

So slow so that I took out a watch and timed it.

It takes 29 seconds for the environment to spin up and then between 1 and 3 seconds to run the actual scripts. You can tell because the first thing most of these scripts do after requiring the framework is printing to standard out. But nothing prints for 29 seconds!!!

So that is 29 seconds of nothing and then a flash of my program running. It got really tedious so I decided to investigate. I started with 2 premises:

The average developer would probably not put up with this and still say Rails was cool
I don't remember Rails being like this when I developed on linux.

Google showed me a pile of web pages where people were complaining about rails scripts being slow. They all had one thing in common... They were running windows.

So I decided to burn an hour and try out my project on my wife's Mackbook Pro. And the scripts scream. That 29 second delay is eliminated. And the added bonus is, TextMate is very slick and does not spike the CPU the way Aptana Studio does when I say, click on a tab.

So I'm switching to my wife's mac. I'm assuming she'll like this because it means I'll finish my thesis about 3 years faster.

Saturday, January 31, 2009

Rails Convention and my silly application_id

When I designed the schema a few of the tables were basically there to store constants. I thought it would be a good idea to add an application_id column because it would allow me to explicitly set the value of that column.

So I added an int column to site_update_frequencies, bug_tool_urls, bug_columns, and a few other tables. I used fixtures to set the values of the application_id for each of these fields as incrementing, but consistent integers.

For example the row with site_update_frequency.name = Daily always has site_update_frequency.application_id = 2 no matter how many times I reset the DB. The site_update_frequency.id field changes with each import.

The idea was that I would tell project model, which has a foreign key to site_update_frequency, to look at site_update_frequency.application_id instead of site_update_frequency.id and then I'd be able to call.

In the project model a little belongs_to :site_update_frequency magic and when I call project.site_update_frequency.name in the rails code I should get the name of the site update frequency.

But I forgot that Rails was going link the objects together through the id field of site_update_frequency. So I spent an hour wondering why project.site_update_frequency == nil

Now I know it's because the project is linked to site_update_frequency through the id column which has values kn the 10k range instead of the application_id which has them in the 1-3 range. I'm using the value of the wrong Foreign Key!!!

SMACKS FOREHEAD.... Duh.

I looked around for how to tell rails to link the tables through the application_id and I have a few ideas, but after sleeping on it I started asking a different questions.

Q: Why am I trying to use configuration over convention with Rails?
A: "Ummmm I don't know."

Q: Didn't I choose Rails because of the simplicity of its convention?
A: "Sure did!"

Q: And what does the application_id give me?
A: "Not sure... The ability to behave in a sloth-like manner?"

So I've decided to get rid of the application_id columns and simply rely on the rails standard id columns.

Mostly because application_id seems to serve no purpose other than giving me a headache.

Saturday, January 17, 2009

The Final Import Script COMPLETE!!! Ownership Changes

There was a lot of persuading, but the import of the legacy data is finally done!

The final legacy data import task is to migrate all the bug_ownership_changes. The entire point of this application is that it applies ownership models to lists of bugs so that you know exactly who owns a bug (should be doing something about that bug). Ownership tends to change throughout the life-cycle of a bug.

There are some edge cases but basically ownership works like this: When a bug is logged it's owned by a manager who needs to assign it. Once assigned it's owned by the engineer to whom it's been assigned. Once it's fixed it's owned by the person that submitted it because they need to verify the engineer that fixed it isn't (and this is a technical term) full-of-crap.

Every once in a while someone comes up with a reason to violate the sanctity of the default ownership model behavior. They decide that the submitter shouldn't be the person to verify the full-of-crap-ness of the engineer, because they are on vacation, and they want to make someone else the owner of the bug.

I would have given my kingdom for a bug system that lets you save arbitrary data, but it was not to be. And so, in order to support the unfortunate, despicable, and some have claimed immoral practice of ownership changes, the legacy application is forced to store ownership change information.

Apparently I forgot about this in my original schema, because there was no hint of a table for this information. Alas, it was easy to add and I have. The table looks like this

Table: bug_ownership_changes

bug_list_id (integer): A foreign key to the id for the the bug list to which this change should be applied.
bugid (string): This is the actual Bug ID, and not an ID from our schema.
new_owner_id (integer): This is the poor sap that has had the responsibility of dealing with this bug dumped on him/her.
assigner_id (integer): This is the person that unceremoniously dumped the bug in someone else's lap. I've added this column because I may decided to embark on a campaign to publicly shame these people at some point in the future, and I want to make sure I have the data handy. :)

Here's the updated schema:

And with that and a little ruby scripting the final import is ready to go!

rake db:migrate:reset
rake db:fixtures:load
script/runner script/import/project.rb

The script ran. The data is in the DB. I'm done with basic legacy data importing.

What's Next?
Next step is to work out the communication between the controller and the generator. In other words, figure out how to take all the information I just stuck in the DB and send it back to the generator in a way the generator can understand it.

Wednesday, January 14, 2009

Too many rows for history!

In the legacy system " Histories" are used to store the bug count for the cross section of each owner, buglist and project each time the site is generated. This information is used in historical reporting to show how a person's, team's, or project's bug count for a given bug list has changed over time.

I sat down to write the DB import code for bug count histories tonight and noticed I didn't have a table for it. Suspicious...

So I looked at my old notes and discovered a calcuation I did a while back to figure out the number of rows I'd need in my DB.

The Math

Assuming 100 active sites/projects

(active project_people ~ 30,000) * (active bug lists ~ 1000) * (1 year of active data ~365 days) == 10,950,000,000 (Time to call Carl Sagan)

I did a few quick searches and found answers suggesting there was no theoretical limit to DB size as long as you have a nudlear powered CPU, infinate memory and a heat sink liquid cooled by the nothern Atlantic ocean.

BUT... They all talk about things in terms of millions of rows and keeping individual tables under 500 million rows to keep from encouraging Loki (god of fire and trickery) from laying waste to their application.

Additional review of my calculations shows:

Description	row count	% of DB
Sum of all rows except bug counts:	428,058	0.003909%
Sum of History Rows for 1 year:	10,950,000,000	99.9960909%

The purpose of the Central Controller was to have a central point for site management. Not to act as a historical data repository that will someday rival amazon.com's.

OK that's an exaggeration, but it occurs to me that if I store histories in the DB I'm leaving a ticking time bomb for whoever takes over the app. First it the central controller will start to slow down and about a 1.5 years after we deploy it will either crash the server or heat it up so hot that it will melt a hole through the outer mantle of the planet.

Either way, I'm not doing anyone any favors writing this import code.

The Solution

Given:

Histories have only one purpose, graphing historical bug counts.
I'm planning on replacing the graphing tools using a javascript based graphing tool
That graphing tool is going to want historical data in Javascript

I'm going to:

Not import histories into the DB. Instead I'll store the historical data in JSON as flat files on the web server with the other generated site content.

To do that I need to:

Figure out the best Object Notation for the graphing tool I'm going to use.
Write a one time script that writes converts the existing historical data into JSON files
Teach the generator how to read and re-render those files each time it runs.

So I'm moving the histories import mechanism into the next phase when I work on the generator.

Sunday, January 11, 2009

Projections... Done

I took friday night off, and tackled projections today. They are working, though I'm getting a taste of how intertwined this data is... I mistakenly tried to import the projections after resetting the DB only to discover there were no projects or people to use as foreign keys.

All the same, it's working!

What's Next

~~site projections from legacy\sites\[sitename]\output\projections\~~
bug list histories legacy\sites\[sitename]\output\history
Ownership changes legacy\sites\[sitename]\output\ownership_change

Another minor, but convoluted, schema change...

projections.project_person was simply a mistake. The column is there as a way to tie an expected bug count for a given date to a team. The ideal way to deal with this concept is to have a Team model. But I don't have one. In the legacy system Teams are infered by grouping all the people that have the same manager. And projections were always tied to that manager.

So the real relationship I want doesn't exist. I played with the idea of creating a Team model, but per the legacy system, it serves only one purpose, that being something to tie the projections to.

So I decided to skip creating a team and go with something less "right" but more practical. That gives me two choices

1. Owners This is the Model that links People to their managers and to projects. (This is the model that project_persons turned into in a latter stage of the schema.)

2. People This is where people's username and full name are stored. A persson is connected to a Project via the owners table.

So the most right way of these two practical (but technically incorrect) choices is to link to the Onwer, but most of the time I'm going to want to look up projections based on a manager_id anyway, so it seems silly to store the Owner.id just to have to use it to get the managers person.id.

If you followed that you, I'm very impressed.

Needless to say, I decided to change projection.project_person to projection.person_id because that is what I'm going to want to know when looking up projections.

Here's the updated schema:

The Catch:
Because I elected not to create a Team model that means that projections still reside with a person (the manager) instead of a team. That means when the manager of a team changes, I need to make sure to go through the projections and update them. It's a small price to pay for something that rarely happens and will make life easier the other 99.9% of the time.

Thursday, January 8, 2009

Queries && People && Teams... All Importing!

Queries, People and their relationships to managers and projects are all importing!

Queries
My decision to go to bed last night was a good one. The bug was as I suspected a silly one. I was calling element.elements['display-name'].get_text and there is no get_text method... It's just text.

In order to get queries working required another schema update. I changed bug_lists.expert_query and bug_lists.description to be of type :binary which is MySql equivalent of a blob. The legacy data for those fields was to big for a column type string.

So here, once again is the updated schema:

People & Relationships
Once that was working, I imported all the ownership files. That creates the person objects for everyone in the system and ties them to projects and their managers.

I actually stumbled on code I wrote back in July when I was in upstate NY and didn'y have an internet connection. It mostly worked but it was all brite force code. I did things like manually reading and parsing CSV files instead of using CSV::Reader.parse and checking the DB for an existing user and creating one manually if they didnt exist instead of using Person.find_or_create_by_username. So I just rewrote it tonight using a lot less code.

What's Next
I'm dangerously close to finishing the data migration. There are only three things left to import:

site projections from legacy\sites\[sitename]\output\projections\
bug list histories legacy\sites\[sitename]\output\history
Ownership changes legacy\sites\[sitename]\output\ownership_change

I plan to tackle them all tomorrow night. Then its on to getting the legacy system talking to the new controller.

Queries are almost importing...

Tonight we went out to dinner. On the way home the roads froze and it took u 40 minutes to get our car 20 feet up our street. We live on a steep hill.

Once that was done and everyone was in bed I got queries importing for every project.

This required some schema changes:

I added a scope filter table to store the constants data about scope filters
In the bug_lists table scope_filter (string) was changed to scope_filter(integer) so it could be a foreign key to the scope_filters table.
I added ownership_models.legacy_name (string) which gives me a way to reference the internal name the by which the legacy generators know the ownership models.

Once again rake db:migrate:reset and rake db:fixtures:load are my friends.

The updated schema looks like this:

Alas, queries are not properly importing yet. There's bug thesis_importer.rb in the code that parses display-name out of the legacy XML. It's probably very easy to fix, but it's also 12:22 AM and my eyes aren't focusing very well, so victory will have to wait till tomorrow night.

Wednesday, January 7, 2009

The Aptana Saga Continues

The developers over at Aptana are on the ball. They've been looking into the bugs I logged ROR-1097 and ROR-1098

So far they haven't found the thing that's causing the CPU to spike on startup and when I mistakenly click on the Rake Tasks tab. But I'm getting by. When the bug hits I just take a coffee break.

I was having trouble being specific about the bug behavior since it manifests for 6 minutes and most of the time the behavior is spiked CPU and nothing else. So today while I was in a meeting and my PC was idol, I fired up Camtasia and made a screen recording of the bug.

Warning! This is really boring to watch.

Sunday, January 4, 2009

Site Properties Data Import is Done!

Everything from SiteProperties.xml is being read translated into the new schema and loaded into the DB. The only remaining part from a few days ago was figuring out the Bug Table Column Order property and I tackled that tonight.

I added a fixture for the bug_table_columns by taking the \legacy\website\configuration\bug_column_configuration.xml and doing regex until it looked like a fixture.

Once I had that fixture set up the import of the bug column order for a site was trivial.

In the legacy system the bug column order is stored as a list of column names so I loaded the fixture using YAML.load and then looked up the application_id for each column based on the legacy name. Once I have the application_id I put the IDs together in a comma separated string and that goes into projects.table_column_order

The benefit of using the fixture is I'm guaranteed to have a match between the application_id in the DB and the ones I map using my data import script since they use the same fixture.

Storing order as a comma separated string is a bit of a kluge, but this order is rarely changed and rarely looked up. It doesn't seem worth storing it in a series of rows when it's so easy to work with.

Ruby is my friend
I remain ever impressed with how easy Ruby and Rails makes things that used to be much harder. Of course I'm relearning a lot of the syntax as I go, but that just makes it a pleasant surprise every time I think something is going to be difficult (knock on wood).

Thesis.next # => "Queries"
Tomorrow I'm either tackling query data migration or painting my bathroom. If I'm really lucky I'll get to both. Either way Thesis.next # => "Queries"

toothpastefordinner.com

Schema Changes Tonight

What I changed and why:

Added application_id to the bug_columns table. REASON: So projects and bug lists would have a foreign key they could trust instead of the ever elusive "id" which is a pain to map to during legacy data migrations
Added table_column_order to the projects table. REASON: So each project can have a default bug list table column sort order. This was a legacy feature I overlooked in the original schema design.
Moved application_id to come immediately after id in bug_tool_urls and ownership_models. REASON: I remember from a DB course that it was a little harder for a DB to search on a column that was not a fixed distance from the beginning of a row. Since id is fixed, I just moved application_id to appear immediately after it. And if I'm wrong, no harm no foul.

So the schema now looks like this:

db:migrate:reset ROCKS!

With all the legacy data migration testing I've become a HUGE fan of db:migrate:reset which blows away my schema, and rebuilds it from the migrations. It's great because it clears out any previous data as well. Then I just call rake db:fixtures:load and all the constants are loaded.

Gliffy and Schema images

I learned an unfortunate thing about Gliffy when I updated the Schema doc tonight. I'd been linking to the live image for the latest schema in all my previous posts. That means with the exception of the copy in the proposal all of the images in previous posts are now showing the latest instead of the schema at the time of those posts.

Not really a big deal, but could cause confusion later. Fortunately Gliffy keeps a complete change log, so I can always go back if I really need to see what things looked like before. To the left is the last version of the schema before tonight's changes in case I need it for historical reasons.

Friday, January 2, 2009

Documentation - a better way

I had hoped to reuse the online help/documentation from the legacy system, but that is looking less and less likely for three reasons.

The admin UI is being completely rewritten
Tables generated for the sites will use JavaScript based sorting
Graphs will be dynamically generated

As small as that is, it means changes to the online help which was originally written in Robohelp. Robohelp is a problem because you can only edit it if you have a RoboHelp license, and even then it's a pain to use.

I don't have a license anymore. So I've decided to just move the documentation into a wiki and update the help links to point to the wiki. That way, editing becomes a simple task for anyone using the system, and it's free.

I can't believe I didn't have access to a wiki when I wrote the legacy system. Times sure have changed.

Thursday, January 1, 2009

State of the Union

Quick update on where things stand after a hectic few months of infrequent development.

Solving Two Problems
My Thesis advisor jumped in and helped with my Two Problems. The answer to the MySQL install was to fiddle with it until I lost my mind. Then to follow my advisors advice and remove the second copy of MySQL that was running from the Bitnami Rails Stack. I bailed on Bitnami after the first few weeks of dev, but I never bothered to remove it until it burned me.

His solution to the logging problem was priceless. I said "It hurts when I try to do logging this way." He said "stop doing logging that way." [Hand smacks head... DUH!] I was so fixated on getting it working I forgot to check if it was a good idea. Linking the data import script to the webapp logs was not only painful to implement, but would have been painful to use. So i just set up a log file for the data importing process. QED.

Development Mechanics

I develop data migration code in script/import/archive.rb. I start by running that code...

ruby script/import/archive.rb

...and fixing any bugs. This allows me to develop the data migration code and test it without spinning up a new rails environment every time I want to try something out, but it also means I cant do end to end testing legacy-data to new database data. Once archive.rb is working I move that code into lib/thesis_importer.rb and test to make sure the data is actually making it into the DB correctly.

script/runner script/import/project.rb

It's dirty scripting/data-manipulation work, but it's gotta be done.

Currently Working
Everything from site_properties.xml is now importing correctly (except for the bug_list_column_order). That includes a bunch of data transformations from the old file based schema to the new relational schema. Now when I run project.rb it

creates a new project in the data base
populates all the immediate data like name, projections etc,
links all the constant tables like bug_too_url,
finds or creates a person object for that administrator and links them through the project_administrator table.
writes out a useful log file about what it did
doesn't produce any errors

The problem with bug_list_column_order is that I made a mistake with the original schema. The schema is missing a way to set the default bug column display order for a project. I have it on a per bug basis in the bug_lists table , but not for the project. In the legacy system it used to be set for the project and overridable

by each bug list.

So I need to:

add a t.string :table_column_order to the project table (still considering comma separated string of column ids to handle order. not convinced this is a good idea yet.)
Update the YML file for the fixture for bug table columns to have the column data from the legacy constants file
write a mapping function between site properties and the new settings
put the default order into the Projects table during import.

Stuff I did today

Cleaned up a bunch of strange stale bugs in thesis_importer.rb that were keeping it from running.
Got site administrators populating correctly
Wet sledding with my kids.
Got ButToolUrl constants in the database and importing from site properties working correctly.
Fought with a CPU maxing out issue in Aptana Studio. Googled it silly and still didn't find a solution so I logged a bug. I wish I was doing this on a mac...
Wrote this post.

42 days left till the due date. It's crunch time.

		days until Thesis Due 2009-05-01