Saturday, January 31, 2009

Rails Convention and my silly application_id

When I designed the schema a few of the tables were basically there to store constants. I thought it would be a good idea to add an application_id column because it would allow me to explicitly set the value of that column.

So I added an int column to site_update_frequencies, bug_tool_urls, bug_columns, and a few other tables. I used fixtures to set the values of the application_id for each of these fields as incrementing, but consistent integers.

For example the row with site_update_frequency.name = Daily always has site_update_frequency.application_id = 2 no matter how many times I reset the DB. The site_update_frequency.id field changes with each import.

The idea was that I would tell project model, which has a foreign key to site_update_frequency, to look at site_update_frequency.application_id instead of site_update_frequency.id and then I'd be able to call.

In the project model a little belongs_to :site_update_frequency magic and when I call project.site_update_frequency.name in the rails code I should get the name of the site update frequency.

But I forgot that Rails was going link the objects together through the id field of site_update_frequency. So I spent an hour wondering why project.site_update_frequency == nil

Now I know it's because the project is linked to site_update_frequency through the id column which has values kn the 10k range instead of the application_id which has them in the 1-3 range. I'm using the value of the wrong Foreign Key!!!

SMACKS FOREHEAD.... Duh.

I looked around for how to tell rails to link the tables through the application_id and I have a few ideas, but after sleeping on it I started asking a different questions.

Q: Why am I trying to use configuration over convention with Rails?
A: "Ummmm I don't know."

Q: Didn't I choose Rails because of the simplicity of its convention?
A: "Sure did!"

Q: And what does the application_id give me?
A: "Not sure... The ability to behave in a sloth-like manner?"

So I've decided to get rid of the application_id columns and simply rely on the rails standard id columns.

Mostly because application_id seems to serve no purpose other than giving me a headache.

Saturday, January 17, 2009

The Final Import Script COMPLETE!!! Ownership Changes

There was a lot of persuading, but the import of  the legacy data is finally done!

The final legacy data import task is to migrate all the bug_ownership_changes. The entire point of this application is that it applies ownership models to lists of bugs so that you know exactly who owns a bug (should be doing something about that bug). Ownership tends to change throughout the life-cycle of a bug.

There are some edge cases but basically ownership works like this: When a bug is logged it's owned by a manager who needs to assign it. Once assigned it's owned by the engineer to whom it's been assigned. Once it's fixed it's owned by the person that submitted it because they need to verify the engineer that fixed it isn't (and this is a technical term) full-of-crap.

Every once in a while someone comes up with a reason to violate the sanctity of the default ownership model behavior. They decide that the submitter shouldn't be the person to verify the full-of-crap-ness of the engineer, because they are on vacation, and they want to make someone else the owner of the bug.


I would have given my kingdom for a bug system that lets you save arbitrary data, but it was not to be. And so, in order to support the unfortunate, despicable, and some have claimed immoral practice of ownership changes, the legacy application is forced to store ownership change information.

Apparently I forgot about this in my original schema, because there was no hint of a table for this information. Alas, it was easy to add and I have. The table looks like this

Table: bug_ownership_changes
bug_list_id (integer): A foreign key to the id for the the bug list to which this change should be applied.
bugid (string): This is the actual Bug ID, and not an ID from our schema.
new_owner_id (integer): This is the poor sap that has had the responsibility of dealing with this bug dumped on him/her.
assigner_id (integer): This is the person that unceremoniously dumped the bug in someone else's lap. I've added this column because I may decided to embark on a campaign to publicly shame these people at some point in the future, and I want to make sure I have the data handy. :)
Here's the updated schema:



And with that and a little ruby scripting the final import is ready to go!

rake db:migrate:reset
rake db:fixtures:load
script/runner script/import/project.rb

The script ran. The data is in the DB. I'm done with basic legacy data importing.


What's Next?
Next step is to work out the communication between the controller and the generator. In other words, figure out how to take all the information I just stuck in the DB and send it back to the generator in a way the generator can understand it.

Wednesday, January 14, 2009

Too many rows for history!

In the legacy system " Histories" are used to store the bug count for the cross section of each owner, buglist and project each time the site is generated. This information is used in historical reporting to show how a person's, team's, or project's bug count for a given bug list has changed over time.
I sat down to write the DB import code for bug count histories tonight and noticed I didn't have a table for it. Suspicious...
So I looked at my old notes and discovered a calcuation I did a while back to figure out the number of rows I'd need in my DB.
The Math
Assuming 100 active sites/projects

(active project_people ~ 30,000) * (active bug lists ~ 1000) * (1 year of active data ~365 days) == 10,950,000,000 (Time to call Carl Sagan)

I did a few quick searches and found answers suggesting there was no theoretical limit to DB size as long as you have a nudlear powered CPU, infinate memory and a heat sink liquid cooled by the nothern Atlantic ocean.
BUT... They all talk about things in terms of millions of rows and keeping individual tables under 500 million rows to keep from encouraging Loki (god of fire and trickery) from laying waste to their application.
Additional review of my calculations shows:



Descriptionrow count% of DB
Sum of all rows except bug counts:428,0580.003909%
Sum of History Rows for 1 year:10,950,000,00099.9960909%

The purpose of the Central Controller was to have a central point for site management. Not to act as a historical data repository that will someday rival amazon.com's.

OK that's an exaggeration, but it occurs to me that if I store histories in the DB I'm leaving a ticking time bomb for whoever takes over the app. First it the central controller will start to slow down and about a 1.5 years after we deploy it will either crash the server or heat it up so hot that it will melt a hole through the outer mantle of the planet.
Either way, I'm not doing anyone any favors writing this import code.
The Solution

Given:
  • Histories have only one purpose, graphing historical bug counts.
  • I'm planning on replacing the graphing tools using a javascript based graphing tool
  • That graphing tool is going to want historical data in Javascript
I'm going to:
Not import histories into the DB. Instead I'll store the historical data in JSON as flat files on the web server with the other generated site content.
To do that I need to:
  1. Figure out the best Object Notation for the graphing tool I'm going to use.
  2. Write a one time script that writes converts the existing historical data into JSON files
  3. Teach the generator how to read and re-render those files each time it runs.
So I'm moving the histories import mechanism into the next phase when I work on the generator.

Sunday, January 11, 2009

Projections... Done

I took friday night off, and tackled projections today. They are working, though I'm getting a taste of how intertwined this data is... I mistakenly tried to import the projections after resetting the DB only to discover there were no projects or people to use as foreign keys.

All the same, it's working!

What's Next



  1. site projections from legacy\sites\[sitename]\output\projections\
  2. bug list histories legacy\sites\[sitename]\output\history
  3. Ownership changes legacy\sites\[sitename]\output\ownership_change


Another minor, but convoluted, schema change...

projections.project_person was simply a mistake. The column is there as a way to tie an expected bug count for a given date to a team.  The ideal way to deal with this concept is to have a Team model. But I don't have one. In the legacy system Teams are infered by grouping all the people that have the same manager. And projections were always tied to that manager.

So the real relationship I want doesn't exist. I played with the idea of creating a Team model, but per the legacy system, it serves only one purpose, that being something to tie the projections to.

So I decided to skip creating a team and go with something less "right" but more practical. That gives me two choices

1. Owners This is the Model that links People to their managers and to projects. (This is the model that project_persons turned into in a latter stage of the schema.)

2. People This is where people's username and full name are stored. A persson is connected to a Project via the owners table.

So the most right way of these two practical (but technically incorrect) choices is to link to the Onwer, but most of the time I'm going to want to look up projections based on a manager_id anyway, so it seems silly to store the Owner.id just to have to use it to get the managers person.id.

If you followed that you, I'm very impressed.

Needless to say, I decided to change projection.project_person to projection.person_id because that is what I'm going to want to know when looking up projections.

Here's the updated schema:





The Catch:
Because I elected not to create a Team model that means that projections still reside with a person (the manager) instead of a team. That means when the manager of a team changes, I need to make sure to go through the projections and update them. It's a small price to pay for something that rarely happens and will make life easier the other 99.9% of the time.

Thursday, January 8, 2009

Queries && People && Teams... All Importing!

Queries, People and their relationships to managers and projects are all importing!


Queries
My decision to go to bed last night was a good one. The bug was as I suspected a silly one. I was calling element.elements['display-name'].get_text and there is no get_text method... It's just text.

In order to get queries working required another schema update. I changed bug_lists.expert_query and bug_lists.description to be of type :binary which is MySql equivalent of a blob. The legacy data for those fields was to big for a column type string.

So here, once again is the updated schema:


People & Relationships
Once that was working, I imported all the ownership files. That creates the person objects for everyone in the system and ties them to projects and their managers.

I actually stumbled on code I wrote back in July when I was in upstate NY and didn'y have an internet connection. It mostly worked but it was all brite force code. I did things like manually reading and parsing CSV files instead of using CSV::Reader.parse and checking the DB for an existing user and creating one manually if they didnt exist instead of using Person.find_or_create_by_username. So I just rewrote it tonight using a lot less code.

What's Next
I'm dangerously close to finishing the data migration. There are only three things left to import:

  1. site projections from legacy\sites\[sitename]\output\projections\
  2. bug list histories legacy\sites\[sitename]\output\history
  3. Ownership changes legacy\sites\[sitename]\output\ownership_change
I plan to tackle them all tomorrow night. Then its on to getting the legacy system talking to the new controller.

Queries are almost importing...

Tonight we went out to dinner. On the way home the roads froze and it took u 40 minutes to get our car 20 feet up our street. We live on a steep hill.

Once that was done and everyone was in bed I got queries importing for every project.

This required some schema changes:
  • I added a scope filter table to store the constants data about scope filters
  • In the bug_lists table scope_filter (string) was changed to scope_filter(integer) so it could be a foreign key to the scope_filters table.
  • I added ownership_models.legacy_name (string) which gives me a way to reference the internal name the by which the legacy generators know the ownership models.

Once again rake db:migrate:reset and rake db:fixtures:load are my friends.

The updated schema looks like this:



Alas, queries are not properly importing yet. There's bug thesis_importer.rb in the code that parses display-name out of the legacy XML. It's probably very easy to fix, but it's also 12:22 AM and my eyes aren't focusing very well, so victory will have to wait till tomorrow night.

Wednesday, January 7, 2009

The Aptana Saga Continues

The developers over at Aptana are on the ball. They've been looking into the bugs I logged ROR-1097 and ROR-1098

So far they haven't found the thing that's causing the CPU to spike on startup and when I mistakenly click on the Rake Tasks tab. But I'm getting by. When the bug hits I just take a coffee break.

I was having trouble being specific about the bug behavior since it manifests for 6 minutes and most of the time the behavior is spiked CPU and nothing else. So today while I was in a meeting and my PC was idol, I fired up Camtasia and made a screen recording of the bug.

Warning! This is really boring to watch.

Sunday, January 4, 2009

Site Properties Data Import is Done!

Everything from SiteProperties.xml is being read translated into the new schema and loaded into the DB. The only remaining part from a few days ago was figuring out the Bug Table Column Order property and I tackled that tonight.

I added a fixture for the bug_table_columns by taking the \legacy\website\configuration\bug_column_configuration.xml and doing regex until it looked like a fixture.
Once I had that fixture set up the import of the bug column order for a site was trivial.
In the legacy system the bug column order is stored as a list of column names so I loaded the fixture using YAML.load and then looked up the application_id for each column based on the legacy name. Once I have the application_id I put the IDs together in a comma separated string and that goes into projects.table_column_order
The benefit of using the fixture is I'm guaranteed to have a match between the application_id in the DB and the ones I map using my data import script since they use the same fixture.

Storing order as a comma separated string is a bit of a kluge, but this order is rarely changed and rarely looked up. It doesn't seem worth storing it in a series of rows when it's so easy to work with.

Ruby is my friend
I remain ever impressed with how easy Ruby and Rails makes things that used to be much harder. Of course I'm relearning a lot of the syntax as I go, but that just makes it a pleasant surprise every time I think something is going to be difficult (knock on wood).


Thesis.next # => "Queries"
Tomorrow I'm either tackling query data migration or painting my bathroom. If I'm really lucky I'll get to both. Either way Thesis.next # => "Queries"




toothpaste for dinner
toothpastefordinner.com

Schema Changes Tonight

What I changed and why:

  1. Added application_id to the bug_columns table. REASON: So projects and bug lists would have a foreign key they could trust instead of the ever elusive "id" which is a pain to map to during legacy data migrations
  2. Added table_column_order to the projects table. REASON: So each project can have a default bug list table column sort order. This was a legacy feature I overlooked in the original schema design.
  3. Moved application_id to come immediately after id in bug_tool_urls and ownership_models. REASON: I remember from a DB course that it was a little harder for a DB to search on a column that was not a fixed distance from the beginning of a row. Since id is fixed, I just moved application_id to appear immediately after it. And if I'm wrong, no harm no foul.
So the schema now looks like this:
Schema 2009-01-04



db:migrate:reset ROCKS!

With all the legacy data migration testing I've become a HUGE fan of db:migrate:reset which blows away my schema, and rebuilds it from the migrations. It's great because it clears out any previous data as well. Then I just call rake db:fixtures:load and all the constants are loaded.




Gliffy and Schema images
SCHEMA PRE 2009I learned an unfortunate thing about Gliffy when I updated the Schema doc tonight. I'd been linking to the live image for the latest schema in all my previous posts. That means with the exception of the copy in the proposal all of the images in previous posts are now showing the latest instead of the schema at the time of those posts.

Not really a big deal, but could cause confusion later. Fortunately Gliffy keeps a complete change log, so I can always go back if I really need to see what things looked like before. To the left is the last version of the schema before tonight's changes in case I need it for historical reasons.

Friday, January 2, 2009

Documentation - a better way

I had hoped to reuse the online help/documentation from the legacy system, but that is looking less and less likely for three reasons.

  1. The admin UI is being completely rewritten
  2. Tables generated for the sites will use JavaScript based sorting
  3. Graphs will be dynamically generated
As small as that is, it means changes to the online help which was originally written in Robohelp. Robohelp is a problem because you can only edit it if you have a RoboHelp license, and even then it's a pain to use.

I don't have a license anymore. So I've decided to just move the documentation into a wiki and update the help links to point to the wiki. That way, editing becomes a simple task for anyone using the system, and it's free.

I can't believe I didn't have access to a wiki when I wrote the legacy system. Times sure have changed.

Thursday, January 1, 2009

State of the Union

Quick update on where things stand after a hectic few months of infrequent development.

Solving Two Problems
My Thesis advisor jumped in and helped with my Two Problems. The answer to the MySQL install was to fiddle with it until I lost my mind. Then to follow my advisors advice and remove the second copy of MySQL that was running from the Bitnami Rails Stack. I bailed on Bitnami after the first few weeks of dev, but I never bothered to remove it until it burned me.

His solution to the logging problem was priceless. I said "It hurts when I try to do logging this way." He said "stop doing logging that way." [Hand smacks head... DUH!] I was so fixated on getting it working I forgot to check if it was a good idea. Linking the data import script to the webapp logs was not only painful to implement, but would have been painful to use. So i just set up a log file for the data importing process. QED.

Development Mechanics

I develop data migration code in script/import/archive.rb. I start by running that code...

     ruby script/import/archive.rb

...and fixing any bugs. This allows me to develop the data migration code and test it without spinning up a new rails environment every time I want to try something out, but it also means I cant do end to end testing legacy-data to new database data. Once archive.rb is working I move that code into lib/thesis_importer.rb and test to make sure the data is actually making it into the DB correctly.

      script/runner script/import/project.rb

 It's dirty scripting/data-manipulation work, but it's gotta be done.


Currently Working
Everything from site_properties.xml is now importing correctly (except for the bug_list_column_order). That includes a bunch of data transformations from the old file based schema to the new relational schema. Now when I run project.rb it

  1. creates a new project in the data base
  2. populates all the immediate data like name, projections etc, 
  3. links all the constant tables like bug_too_url, 
  4. finds or creates a person object for that administrator and links them through the project_administrator table.
  5. writes out a useful log file about what it did
  6. doesn't produce any errors


The problem with bug_list_column_order is that I made a mistake with the original schema. The schema is missing a way to set the default bug column display order for a project. I have it on a per bug basis in the bug_lists table , but not for the project. In the legacy system it used to be set for the project and overridable

by each bug list.

So I need to:

  1. add a t.string :table_column_order to the project table (still considering comma separated string of column ids to handle order. not convinced this is a good idea yet.)
  2. Update the YML file for the fixture for bug table columns to have the column data from the legacy constants file
  3. write a mapping function between site properties and the new settings 
  4. put the default order into the Projects table during import.


Stuff I did today

  1. Cleaned up a bunch of strange stale bugs in thesis_importer.rb that were keeping it from running.
  2. Got site administrators populating correctly
  3. Wet sledding with my kids.
  4. Got ButToolUrl constants in the database and importing from site properties working correctly. 
  5. Fought with a CPU maxing out issue in Aptana Studio. Googled it silly and still didn't find a solution so I logged a bug. I wish I was doing this on a mac...
  6. Wrote this post.
42 days left till the due date. It's crunch time.