Thesis Development: schema

Showing posts with label schema. Show all posts

Saturday, February 21, 2009

Development Status Update

The blog has been quiet because I'm going flat out on development with a goal of finishing on March 1st.

The following is a quick brain dump of my progress. No promises on spelling here. It's 1:30am on a Friday after a particularly long week.

Active MQ is Gone:
I've eliminated ActiveMQ and replaced it with basic web service calls from the generators. REST wasn't in rails the last time I developed a full app. I like what they've done with the 2.0. It has a decidedly CRUD feel all the way through.

The switch from AMQ to REST greatly simplified communications mechanism and got around a limitation in the way the rails plug-in for AMQ handles, or rather doesn't handle temporary queues.

I'll post the communication message call flow as soon as I'm done testing it (see generator perl below).

UI Update / Schema

Most of the admin UI is built. I used Active Scaffold. It took a while, but I figured out how to organize the UI work flow so that it worked around Active Scaffold limitations.

This did require a change to the schema, but it's a really good change. I was previously keeping team information in the owners table.

The Owners table had the columns project_id, person_id, manager_id. The applicaiton logic stated that if person_id == manager_id then that person was the manager of the team.

I found myself writing a lot of code to deal with this logical relationship and when I tried to make it work with active scaffold it really started to hurt. So I changed the schema.

Now there is a Team table:

class CreateTeams < ActiveRecord::Migration
def self.up
    create_table :teams do |t|
      t.integer :project_id
      t.integer :person_id (this is the manager)

      t.timestamps
    end
end

And the Owners Table now looks like this:
class CreateOwners < ActiveRecord::Migration
def self.up
    create_table :owners do |t|
      t.integer :team_id
      t.integer :person_id

      t.timestamps
    end
end

I deleted a tun of code and acivescaffold just works!

Convetion over configuration every time!!!

Giving Active Scaffold a Push - Dynamic action_links
For the sake of UI consistency I did make one hack on top of Active Scaffold. I needed a link at the top of the table that had a dynamic value in it. On the teams table I wanted a link to add a new team and I needed to pass the current project ID. That requires a fair rewrite of the way active scaffold passes parameters when creating action links.

Instead of doing it the hard way I just replaced the links by modifying the resulting pages DOM with some JavaScript. http://code.google.com/p/thesisdev/source/browse/trunk/controller/app/views/teams/for_project.html.erb

CGI::Session::CookieStore::CookieOverflow
Every project has a bug that just scares the crap out of me. Getting 500 server errors because of a Cookie overflow was that bug for me. It happened at random on certain pages. It turned out to be the way I was using Active Scaffold.

I was rendering with this line

:active_scaffold => 'teams',
:constraints => {:project_id => @project}

Apparently Active Scaffold stores constrains in the session object. So after refreshing a few times and the leaving and returing to this page BOOOOOMM!!! All that project info was stored in the cookie and it overflowed its 4k limit.

The fix was simple. Just change @project to @project.id so the new code is

:active_scaffold => 'teams',
:constraints => {:project_id => @project.id}

And that get's rid of the cookie overflows.

Active Scaffold Sortable is a lie!

There's this amazing screencast showing a rails plugin that allows you to drag and drop records to reorder them. When I install the plugin I can't even get the server to start. I've tried every permutation of rails/active_scaffold version I can think of, posted emails on forums and even emailed the developer. Result = silence.

Based on the other forum posts it appears the code worked for a brief window when stars aligned and no one touched the active_scaffold code base. That time passed, and no one tagged the source tree, so it is lost forever. Sadly I'm giving up on sortable, and will have to figure out some other way to order bug lists and table column headers.

Generator Perl:
The perl package that enables the generator to talk to the controller via REST is written. Testing with the controller commences tomorrow.

I still have to integrate this with the generator, but I had a look through the generator logic and it's all neatly contained in a single well documented script. [Thank you me from 5 years ago!] Once the controller is tested I'll be able to drop the communication package in, and add/edit a few simple lines, to make the controller the master of the generators workload.

Deployment:
Three Red Hat Linux Virtual Machines should be available early next week so I can test the system in a truly distributed fashion.

What's Left in Development:

Error checking logic for adding teams, owners and administrators
Changing the generator to listen to the controller
Controller / generator integration
Controller UI for generators and generations
Cloning Projects, Bug_Lists and Teams
Lots of testing
Deployment to the test bed

Once it's deployed I'm going to start writing my paper. The applicaion will have 3 weeks of real life soak time while I'm writing.

While I don't plan to deploy it in production before I finish the thesis, it should be very obvious if there are any problems with the system.

Saturday, January 31, 2009

Rails Convention and my silly application_id

When I designed the schema a few of the tables were basically there to store constants. I thought it would be a good idea to add an application_id column because it would allow me to explicitly set the value of that column.

So I added an int column to site_update_frequencies, bug_tool_urls, bug_columns, and a few other tables. I used fixtures to set the values of the application_id for each of these fields as incrementing, but consistent integers.

For example the row with site_update_frequency.name = Daily always has site_update_frequency.application_id = 2 no matter how many times I reset the DB. The site_update_frequency.id field changes with each import.

The idea was that I would tell project model, which has a foreign key to site_update_frequency, to look at site_update_frequency.application_id instead of site_update_frequency.id and then I'd be able to call.

In the project model a little belongs_to :site_update_frequency magic and when I call project.site_update_frequency.name in the rails code I should get the name of the site update frequency.

But I forgot that Rails was going link the objects together through the id field of site_update_frequency. So I spent an hour wondering why project.site_update_frequency == nil

Now I know it's because the project is linked to site_update_frequency through the id column which has values kn the 10k range instead of the application_id which has them in the 1-3 range. I'm using the value of the wrong Foreign Key!!!

SMACKS FOREHEAD.... Duh.

I looked around for how to tell rails to link the tables through the application_id and I have a few ideas, but after sleeping on it I started asking a different questions.

Q: Why am I trying to use configuration over convention with Rails?
A: "Ummmm I don't know."

Q: Didn't I choose Rails because of the simplicity of its convention?
A: "Sure did!"

Q: And what does the application_id give me?
A: "Not sure... The ability to behave in a sloth-like manner?"

So I've decided to get rid of the application_id columns and simply rely on the rails standard id columns.

Mostly because application_id seems to serve no purpose other than giving me a headache.

Saturday, January 17, 2009

The Final Import Script COMPLETE!!! Ownership Changes

There was a lot of persuading, but the import of the legacy data is finally done!

The final legacy data import task is to migrate all the bug_ownership_changes. The entire point of this application is that it applies ownership models to lists of bugs so that you know exactly who owns a bug (should be doing something about that bug). Ownership tends to change throughout the life-cycle of a bug.

There are some edge cases but basically ownership works like this: When a bug is logged it's owned by a manager who needs to assign it. Once assigned it's owned by the engineer to whom it's been assigned. Once it's fixed it's owned by the person that submitted it because they need to verify the engineer that fixed it isn't (and this is a technical term) full-of-crap.

Every once in a while someone comes up with a reason to violate the sanctity of the default ownership model behavior. They decide that the submitter shouldn't be the person to verify the full-of-crap-ness of the engineer, because they are on vacation, and they want to make someone else the owner of the bug.

I would have given my kingdom for a bug system that lets you save arbitrary data, but it was not to be. And so, in order to support the unfortunate, despicable, and some have claimed immoral practice of ownership changes, the legacy application is forced to store ownership change information.

Apparently I forgot about this in my original schema, because there was no hint of a table for this information. Alas, it was easy to add and I have. The table looks like this

Table: bug_ownership_changes

bug_list_id (integer): A foreign key to the id for the the bug list to which this change should be applied.
bugid (string): This is the actual Bug ID, and not an ID from our schema.
new_owner_id (integer): This is the poor sap that has had the responsibility of dealing with this bug dumped on him/her.
assigner_id (integer): This is the person that unceremoniously dumped the bug in someone else's lap. I've added this column because I may decided to embark on a campaign to publicly shame these people at some point in the future, and I want to make sure I have the data handy. :)

Here's the updated schema:

And with that and a little ruby scripting the final import is ready to go!

rake db:migrate:reset
rake db:fixtures:load
script/runner script/import/project.rb

The script ran. The data is in the DB. I'm done with basic legacy data importing.

What's Next?
Next step is to work out the communication between the controller and the generator. In other words, figure out how to take all the information I just stuck in the DB and send it back to the generator in a way the generator can understand it.

Wednesday, January 14, 2009

Too many rows for history!

In the legacy system " Histories" are used to store the bug count for the cross section of each owner, buglist and project each time the site is generated. This information is used in historical reporting to show how a person's, team's, or project's bug count for a given bug list has changed over time.

I sat down to write the DB import code for bug count histories tonight and noticed I didn't have a table for it. Suspicious...

So I looked at my old notes and discovered a calcuation I did a while back to figure out the number of rows I'd need in my DB.

The Math

Assuming 100 active sites/projects

(active project_people ~ 30,000) * (active bug lists ~ 1000) * (1 year of active data ~365 days) == 10,950,000,000 (Time to call Carl Sagan)

I did a few quick searches and found answers suggesting there was no theoretical limit to DB size as long as you have a nudlear powered CPU, infinate memory and a heat sink liquid cooled by the nothern Atlantic ocean.

BUT... They all talk about things in terms of millions of rows and keeping individual tables under 500 million rows to keep from encouraging Loki (god of fire and trickery) from laying waste to their application.

Additional review of my calculations shows:

Description	row count	% of DB
Sum of all rows except bug counts:	428,058	0.003909%
Sum of History Rows for 1 year:	10,950,000,000	99.9960909%

The purpose of the Central Controller was to have a central point for site management. Not to act as a historical data repository that will someday rival amazon.com's.

OK that's an exaggeration, but it occurs to me that if I store histories in the DB I'm leaving a ticking time bomb for whoever takes over the app. First it the central controller will start to slow down and about a 1.5 years after we deploy it will either crash the server or heat it up so hot that it will melt a hole through the outer mantle of the planet.

Either way, I'm not doing anyone any favors writing this import code.

The Solution

Given:

Histories have only one purpose, graphing historical bug counts.
I'm planning on replacing the graphing tools using a javascript based graphing tool
That graphing tool is going to want historical data in Javascript

I'm going to:

Not import histories into the DB. Instead I'll store the historical data in JSON as flat files on the web server with the other generated site content.

To do that I need to:

Figure out the best Object Notation for the graphing tool I'm going to use.
Write a one time script that writes converts the existing historical data into JSON files
Teach the generator how to read and re-render those files each time it runs.

So I'm moving the histories import mechanism into the next phase when I work on the generator.

Sunday, January 11, 2009

Projections... Done

I took friday night off, and tackled projections today. They are working, though I'm getting a taste of how intertwined this data is... I mistakenly tried to import the projections after resetting the DB only to discover there were no projects or people to use as foreign keys.

All the same, it's working!

What's Next

~~site projections from legacy\sites\[sitename]\output\projections\~~
bug list histories legacy\sites\[sitename]\output\history
Ownership changes legacy\sites\[sitename]\output\ownership_change

Another minor, but convoluted, schema change...

projections.project_person was simply a mistake. The column is there as a way to tie an expected bug count for a given date to a team. The ideal way to deal with this concept is to have a Team model. But I don't have one. In the legacy system Teams are infered by grouping all the people that have the same manager. And projections were always tied to that manager.

So the real relationship I want doesn't exist. I played with the idea of creating a Team model, but per the legacy system, it serves only one purpose, that being something to tie the projections to.

So I decided to skip creating a team and go with something less "right" but more practical. That gives me two choices

1. Owners This is the Model that links People to their managers and to projects. (This is the model that project_persons turned into in a latter stage of the schema.)

2. People This is where people's username and full name are stored. A persson is connected to a Project via the owners table.

So the most right way of these two practical (but technically incorrect) choices is to link to the Onwer, but most of the time I'm going to want to look up projections based on a manager_id anyway, so it seems silly to store the Owner.id just to have to use it to get the managers person.id.

If you followed that you, I'm very impressed.

Needless to say, I decided to change projection.project_person to projection.person_id because that is what I'm going to want to know when looking up projections.

Here's the updated schema:

The Catch:
Because I elected not to create a Team model that means that projections still reside with a person (the manager) instead of a team. That means when the manager of a team changes, I need to make sure to go through the projections and update them. It's a small price to pay for something that rarely happens and will make life easier the other 99.9% of the time.

Thursday, January 8, 2009

Queries && People && Teams... All Importing!

Queries, People and their relationships to managers and projects are all importing!

Queries
My decision to go to bed last night was a good one. The bug was as I suspected a silly one. I was calling element.elements['display-name'].get_text and there is no get_text method... It's just text.

In order to get queries working required another schema update. I changed bug_lists.expert_query and bug_lists.description to be of type :binary which is MySql equivalent of a blob. The legacy data for those fields was to big for a column type string.

So here, once again is the updated schema:

People & Relationships
Once that was working, I imported all the ownership files. That creates the person objects for everyone in the system and ties them to projects and their managers.

I actually stumbled on code I wrote back in July when I was in upstate NY and didn'y have an internet connection. It mostly worked but it was all brite force code. I did things like manually reading and parsing CSV files instead of using CSV::Reader.parse and checking the DB for an existing user and creating one manually if they didnt exist instead of using Person.find_or_create_by_username. So I just rewrote it tonight using a lot less code.

What's Next
I'm dangerously close to finishing the data migration. There are only three things left to import:

site projections from legacy\sites\[sitename]\output\projections\
bug list histories legacy\sites\[sitename]\output\history
Ownership changes legacy\sites\[sitename]\output\ownership_change

I plan to tackle them all tomorrow night. Then its on to getting the legacy system talking to the new controller.

Queries are almost importing...

Tonight we went out to dinner. On the way home the roads froze and it took u 40 minutes to get our car 20 feet up our street. We live on a steep hill.

Once that was done and everyone was in bed I got queries importing for every project.

This required some schema changes:

I added a scope filter table to store the constants data about scope filters
In the bug_lists table scope_filter (string) was changed to scope_filter(integer) so it could be a foreign key to the scope_filters table.
I added ownership_models.legacy_name (string) which gives me a way to reference the internal name the by which the legacy generators know the ownership models.

Once again rake db:migrate:reset and rake db:fixtures:load are my friends.

The updated schema looks like this:

Alas, queries are not properly importing yet. There's bug thesis_importer.rb in the code that parses display-name out of the legacy XML. It's probably very easy to fix, but it's also 12:22 AM and my eyes aren't focusing very well, so victory will have to wait till tomorrow night.

Sunday, January 4, 2009

Schema Changes Tonight

What I changed and why:

Added application_id to the bug_columns table. REASON: So projects and bug lists would have a foreign key they could trust instead of the ever elusive "id" which is a pain to map to during legacy data migrations
Added table_column_order to the projects table. REASON: So each project can have a default bug list table column sort order. This was a legacy feature I overlooked in the original schema design.
Moved application_id to come immediately after id in bug_tool_urls and ownership_models. REASON: I remember from a DB course that it was a little harder for a DB to search on a column that was not a fixed distance from the beginning of a row. Since id is fixed, I just moved application_id to appear immediately after it. And if I'm wrong, no harm no foul.

So the schema now looks like this:

db:migrate:reset ROCKS!

With all the legacy data migration testing I've become a HUGE fan of db:migrate:reset which blows away my schema, and rebuilds it from the migrations. It's great because it clears out any previous data as well. Then I just call rake db:fixtures:load and all the constants are loaded.

Gliffy and Schema images

I learned an unfortunate thing about Gliffy when I updated the Schema doc tonight. I'd been linking to the live image for the latest schema in all my previous posts. That means with the exception of the copy in the proposal all of the images in previous posts are now showing the latest instead of the schema at the time of those posts.

Not really a big deal, but could cause confusion later. Fortunately Gliffy keeps a complete change log, so I can always go back if I really need to see what things looked like before. To the left is the last version of the schema before tonight's changes in case I need it for historical reasons.

Thursday, January 1, 2009

State of the Union

Quick update on where things stand after a hectic few months of infrequent development.

Solving Two Problems
My Thesis advisor jumped in and helped with my Two Problems. The answer to the MySQL install was to fiddle with it until I lost my mind. Then to follow my advisors advice and remove the second copy of MySQL that was running from the Bitnami Rails Stack. I bailed on Bitnami after the first few weeks of dev, but I never bothered to remove it until it burned me.

His solution to the logging problem was priceless. I said "It hurts when I try to do logging this way." He said "stop doing logging that way." [Hand smacks head... DUH!] I was so fixated on getting it working I forgot to check if it was a good idea. Linking the data import script to the webapp logs was not only painful to implement, but would have been painful to use. So i just set up a log file for the data importing process. QED.

Development Mechanics

I develop data migration code in script/import/archive.rb. I start by running that code...

ruby script/import/archive.rb

...and fixing any bugs. This allows me to develop the data migration code and test it without spinning up a new rails environment every time I want to try something out, but it also means I cant do end to end testing legacy-data to new database data. Once archive.rb is working I move that code into lib/thesis_importer.rb and test to make sure the data is actually making it into the DB correctly.

script/runner script/import/project.rb

It's dirty scripting/data-manipulation work, but it's gotta be done.

Currently Working
Everything from site_properties.xml is now importing correctly (except for the bug_list_column_order). That includes a bunch of data transformations from the old file based schema to the new relational schema. Now when I run project.rb it

creates a new project in the data base
populates all the immediate data like name, projections etc,
links all the constant tables like bug_too_url,
finds or creates a person object for that administrator and links them through the project_administrator table.
writes out a useful log file about what it did
doesn't produce any errors

The problem with bug_list_column_order is that I made a mistake with the original schema. The schema is missing a way to set the default bug column display order for a project. I have it on a per bug basis in the bug_lists table , but not for the project. In the legacy system it used to be set for the project and overridable

by each bug list.

So I need to:

add a t.string :table_column_order to the project table (still considering comma separated string of column ids to handle order. not convinced this is a good idea yet.)
Update the YML file for the fixture for bug table columns to have the column data from the legacy constants file
write a mapping function between site properties and the new settings
put the default order into the Projects table during import.

Stuff I did today

Cleaned up a bunch of strange stale bugs in thesis_importer.rb that were keeping it from running.
Got site administrators populating correctly
Wet sledding with my kids.
Got ButToolUrl constants in the database and importing from site properties working correctly.
Fought with a CPU maxing out issue in Aptana Studio. Googled it silly and still didn't find a solution so I logged a bug. I wish I was doing this on a mac...
Wrote this post.

42 days left till the due date. It's crunch time.

Friday, June 27, 2008

First Line of Code

In the last two weeks I successfully:

Got my development environment set up
Created the model and migrations written based on the schema I've designed.
Prototyped the migration script

Dev environment: I’m using Aptana Studio ( the evolved form of RadRails for Eclipse ) for an IDE. It’s come a long way since RadRails. There are a lot of IDE choices out there. I chose Aptana because most of the developers I know are using some form of Eclipse, so this gives me a chance to try out all the cool plugins they're always talking about. I also installed the Bitnami Ruby Stack which includes MySQL.

Creat Models & Migrations: I created a rails project in Aptana and constructed the scaffold (down to migrations) for each of the tables in my schema.

Some of the scaffolds will be thrown away (join tables) but it’s useful to be able to view and manipulate data from the UI for the moment.

I made a few simple changes/notes to the schema.

In the projects table start_date and projection_start_date seem to be duplicates. I need to go back and validate the meaning of each of those rows
I added an application_id column to data tables such as the bug_tool_urls table. This is a table that has data populated once and it rarely (almost never) changes. Essentially it contains constants. The application_id column is meant to give me strict control over the ID that other products will use to join to this table. I’ve run into a problem where the default id column remembers the last rows ID even if I delete that row. So this makes it difficult for me to easily map to the URLs when I’m importing data.
Even as I added the column I thought of a few ways to easily eliminate the need for the application id. I have a hunch it won’t be long lived.

So the migrations are written and I created a SQLite3 DB using them. I’ll switch to MYSQL when I get closer to a fully functioning system.

Data Import: I also prototyped scripts for migrating the existing text and xml based site data into the DB. I’ve created a helper class that has all of the brains of the migration. It contains POC code that:

Shows how to traverse the directory structure of an existing site
Read XML site properties using REXML
Put those properties into a new Project model and save them to the DB.

The trick to making this work is to run the script using script/runner which spins up the rails environment for this project and gives me access to the DB through the Project model using ActiveRecord. It's a fully script-able environment with all the ease of rails based DB access!

Next steps

#1 Complete Data Import
Everything else depends on the existence of data in the DB. If I proceed with another part of the project first I’ll need to build mock data. Given that I have loads of production data, it’s probably faster and less error prone to finish the data import and use real data when developing the other parts.

#2 Prototype the communication channel between the Central Controller and Generators.

This has three parts

Integrating ActiveMQ with the existing Generator.
Integrating ActiveMQ with the Rails Application
Prototype messaging between the systems

Once the systems are talking I’ll real data sets to make sure message load and sizes are reasonable. I’ve already thought through this, so what I really need is the real data so I can run some realistic tests.

The Rails Administration portal GUI – While this will be fun, the scaffolds are sufficient for supporting the development of the other portions.

The data import and AMQ work will take me through to the begging of August at which time I’ll assess next steps.

Friday, May 2, 2008

DB Schema - First Draft

I've completed the first draft of the schema. These are my note s on the mappings form old XML based fields to the new relational model. I only covered the parts that did not map 1 to 1.

This is a picture of the first draft schema. Click the image to see a larger (readable) version.

Thesis Development

Saturday, February 21, 2009

Development Status Update

Saturday, January 31, 2009

Rails Convention and my silly application_id

Saturday, January 17, 2009

The Final Import Script COMPLETE!!! Ownership Changes

Wednesday, January 14, 2009

Too many rows for history!

Sunday, January 11, 2009

Projections... Done

Thursday, January 8, 2009

Queries && People && Teams... All Importing!

Queries are almost importing...

Sunday, January 4, 2009

Schema Changes Tonight

Thursday, January 1, 2009

State of the Union

Friday, June 27, 2008

First Line of Code

Friday, May 2, 2008

DB Schema - First Draft

About...

Useful Links

Labels

Blog Archive

		days until Thesis Due 2009-05-01

Saturday, February 21, 2009

Saturday, January 31, 2009

Saturday, January 17, 2009

Wednesday, January 14, 2009

Sunday, January 11, 2009

Thursday, January 8, 2009

Sunday, January 4, 2009

Thursday, January 1, 2009

Friday, June 27, 2008

Friday, May 2, 2008

About...

Subscribe To

Useful Links

Labels

Blog Archive