Reversible Migrations

posted: May 6th, 2011 by: Rohit Arondekar

Rails 3.1

Migrations have always been considered one of the many killer features in Rails. And in Rails 3.1 Migrations got a new trick up their sleeve that will greatly simplify the process of maintaining both the up and down logic. If you need a little refresher on what migrations are then I suggest reading the official Rails guide.

Lets start by looking at how a typical migration looks like in Rails 3.0

Ruby - db/migrate/20110505090317_create_posts.rb
  class CreatePosts < ActiveRecord::Migration
    def self.up
      create_table :posts do |t|
        t.string :title
        t.text :body
        
        t.timestamps
      end
    end
    
    def self.down
      drop_table :posts
    end
  end

This migration creates a posts table with two fields — title and body of type string and text, respectively. The timestamps helper creates datetime fields — created_at and updated_at for free. To reverse this migration we simply need to drop the posts table. The class method down does precisely this. When Rails is applying a migration it runs the class method up. To reverse the migration (as can be done with rake db:rollback) it runs the class method down.

Two questions come up when you look at this migration —

  • Why class methods instead of plain ‘old instance methods?
  • For simple cases why not just define the up migration and have Rails take care of reversing the migration?

The ever awesome Aaron Patterson thought the same thing and and decided to simplify things for you and I.

Introducing change

If you run the following command in Edge Rails, the genarated migration will look something like the example below:

console
  (~/code/migrahedron)९ rails g model Post title:string body:text
Ruby - db/migrate/20110505084530_create_posts.rb
  class CreatePosts < ActiveRecord::Migration
    def change
      create_table :posts do |t|
        t.string :title
        t.text :body
  
        t.timestamps
      end
    end
  end

By default Rails 3.1 will generate migrations for models using the change method that will hold the up logic. When a rollback is requested Rails will figure out how to reverse the migration for you merely by examining the ‘up’ direction directives. Go ahead and apply the migration and then rollback. You should see something like the following:

console
  (~/code/migrahedron)९ rake db:migrate
  (in /home/rohit/code/migrahedron)
  ==  CreatePosts: migrating ====================================================
  -- create_table(:posts)
     -> 0.0012s
  ==  CreatePosts: migrated (0.0013s) ===========================================
  

  (~/code/migrahedron)९ rake db:rollback
  (in /home/rohit/code/migrahedron)
  ==  CreatePosts: reverting ====================================================
  -- drop_table("posts")
     -> 0.0007s
  ==  CreatePosts: reverted (0.0008s) ===========================================

Notice how Rails has figured out that in order to reverse the migration, it needs to drop the newly created table.

What about commands that can’t be reversed?

There are certain commands like remove_column that cannot be automatically reversed. This is because the information required to re-create the column is not available in the remove_column command. If Rails encounters such commands while reversing a migration, an ActiveRecord::IrreversibleMigration exception will be raised.

Ruby - db/migrate/20110505101449_remove_title_from_post.rb
  class RemoveTitleFromPost < ActiveRecord::Migration
    def change
      remove_column :posts, :title
    end
  end

If you try rolling back the above migration you will get something like:

console
  (~/code/migrahedron)९ rake db:rollback
  (in /home/rohit/code/migrahedron)
  ==  RemoveTitleFromPost: reverting ============================================
  rake aborted!
  An error has occurred, this and all later migrations canceled:
  
  ActiveRecord::IrreversibleMigration
  
  (See full trace by running task with --trace)

If you want to handle such cases manually you can still define the up and down methods almost like before.

up and down instance methods

The only change to the old up and down methods is that they are now instance methods. Say good bye to those awkward self.up and self.down method definitions.

Ruby - db/migrate/20110505101557_remove_title_from_post.rb
    class RemoveTitleFromPost < ActiveRecord::Migration
      def up
        remove_column :posts, :title
      end

      def down
        add_column :posts, :title, :string
      end
    end

If you’re the difficult type you can still use the old class methods in your migrations. More importantly your existing migrations will not break.

More magic? :(

If you’re wondering how migration reversal is determined, and in the spirit of Jose Valim’s wish to see all Rails magic deconstructed, I thought I’d give a brief idea as to how Rails is reversing a migration automagically.

The magic, erm I mean heavy lifting, happens in the ActiveRecord::Migration::CommandRecorder class. Basically if you define a change method in your migration and are applying the migration then the commands are executed as normal.

However while reversing the migration, the commands are recorded and a list of inverse commands is generated and run. Inverse commands are simply commands that perform the opposite of the original command. For eg: the inverse of rename_table(old, new) is rename_table(new, old). The logic to obtain an inverse of a command is provided in the class itself. For those commands whose inverse cannot be obtained, ActiveRecord::IrreversibleMigration is raised.

That was a very simple overview of what is happening behind the scenes. I encourage you to take a look at the code for yourself to understand how it works.

ActiveRecord Identity Map

posted: April 21st, 2011 by: Josh Kalderimis

Rails 3.1

Due to some recently discovered issues with updating objects in associations, the Identity Map will be turned off by default in Rails 3.1. You can still turn the Identity Map on, but it is recommended you read the documentation for further information.

If you’ve been using rails for a while now you may be familiar with Active Record’s query cache. The query cache is a powerful part of Active Record which reduces unnecessary SQL calls and provides general speed improvements, especially when dealing with associations. The problem with the query cache, however, is when retrieving two identical records from the database two in-memory objects will still be created.

rails console
  user1 = User.find(1) # => #<User id: 1, name: "Josh">
  user2 = User.find(1) # => #<User id: 1, name: "Josh">

  user1 == user2 # => true, b/c AR::Base recognizes that
                 # they have the same primary key

  user1.object_id == user2.object_id # => false, b/c these are two
                                     # different in-memory objects
log/development.log
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT 1  [["id", 1]]
  CACHE (0.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT 1  [["id", 1]]

Thanks to the fantastic work of Emilio Tagua during the Ruby Summer of Code 2010, Active Record in 3.1 will gain an identity map. What’s an identity map you ask? An identity map keeps a collection of previously instantiated records and returns the object associated with the record if a request is made for it again.

rails console
  user1 = User.find(1) # => #<User id: 1, name: "Josh">
  user2 = User.find(1) # => #<User id: 1, name: "Josh">

  user1 == user2 # => true

  user1.object_id == user2.object_id # => true, b/c these really are
                                     # the same in-memory objects
log/development.log
  User Load (2.2ms)  SELECT "users".* FROM "users" LIMIT 1
  User with ID = 1 loaded from Identity Map

Why is having the same in-memory object returned important? Because it ensures that there is only one copy of a model instance floating around your system at any one time. Without this assurance, modifications made to a model object in one context won’t be reflected if a copy exists in another context which can produce hard to trace bugs and inconsistencies.

The identity map is created on a per-request basis and is flushed at the completion of the request (as can be expected, the implementation is thread-safe). You can also use an identity map in the console, background worker, or manually within a request (if it’s turned off by default).

Ruby - app/models/user.rb
  ActiveRecord::IdentityMap.use do
    user = User.find(id)
    user.do_that_heavy_thing_you_do!
  end

Although Rails 3.1 will come with the identity map built-in but turned off by default, you can try it out for yourself by living on the edge and adding the following to application.rb :

Ruby - config/application.rb
  config.active_record.identity_map = true

And while the query cache is all about speed improvements, the identity map is primarily focused on consistency, thus they go hand in hand.

The following resources were instrumental in the research, creation and construction of this article. They may also provide a different angle should you be left wanting after reading this post:

Custom ActiveRecord Attribute Serialization

posted: March 9th, 2011 by: Jeff Kreeftmeijer

Rails 3.1

As you might know already, ActiveRecord lets you store serialized objects by using serialize in your model:

Ruby - app/models/user.rb
  class User < ActiveRecord::Base
    serialize :interests
  end

These serializeable attributes can then be set as any Ruby type such as an array:

rails console
user = User.create!(:interests => ['dinosaurs', 'lasers'])
user.reload.interests # => ["dinosaurs", "lasers"]

Under the hood, this array is automatically converted to YAML. You won’t notice this unless you check what’s being inserted into your database:

log/development.log
INSERT INTO "users" ("updated_at", "interests", "created_at") VALUES ('2011-03-05 13:14:05.054192', '---
- dinosaurs
- lasers
', '2011-03-05 13:14:05.054192')

Using arbitrary objects

Aaron Patterson recently did some work for Rails 3.1 which allows you to use arbitrary objects to serialize your attributes, all they need to do is respond to load and dump. This allows you to specify a custom encoding for your models serialize fields. For example, here is what a Base64 encoding might look like:

Ruby - app/models/user.rb
class User < ActiveRecord::Base
  class Base64
    def load(text)
      return unless text
      text.unpack('m').first
    end

    def dump(text)
      [text].pack 'm'
    end
  end

  serialize :bank_account_number, Base64.new
end

Just like the default YAML serialization, ActiveRecord won’t bother you with any serialized data, unless you check the SQL query:

rails console
user = User.create!(:bank_account_number => "0000001 00000011 0000001 00000011")
user.reload
user.bank_account_number # => "0000001 00000011 0000001 00000011"
log/development.log
INSERT INTO "users" ("updated_at", "bank_account_number", "created_at") VALUES ('2011-03-05 17:12:01.459862', 'MDAwMDAwMSAwMDAwMDAxMSAwMDAwMDAxIDAwMDAwMDEx\n', '2011-03-05 17:12:01.459862')

Just remember, YAML serialization is still used by default, but creating your own serialization object is as simple as load and dump.

Links:

Custom serialization in action video

The Rails commits here and here

Template Inheritance

posted: January 12th, 2011 by: Ryan Daigle

Rails 3.1

Now that the internals of Rails 3 are a little more hospitable to changes we now have a long-standing feature request finally implemented - inherited templates. As the name implies, inherited templates make template lookup follow the controller inheritance heirarchy if it can’t find a template for the current controller. This is probably best described with a basic code example. Assuming we have the following controller heirarchy (basically PostsController inherits from AssetsController):

Ruby - assets_controller.rb
class AssetsController < ApplicationController
  # Assume index, show etc... action definitions
end
Ruby - posts_controller.rb
class PostsController < AssetsController
  # Assume index, show etc... action definitions
end

Assume there are asset templates for show and index but only an index template for posts:

Bash - directory listings
myapp > ls app/views/assets
index.html.haml     show.html.haml

myapp > ls app/views/posts
index.html.haml

In this scenario any request to the show action of the posts_controller will be rendered using the views/assets/show template. index requests will be rendered as expected since both controllers have their own index templates.

Template inheritance also applies to partial lookups and does not touch layout lookup which is already based on heirarchical lookup. (So basically all template types have inherited lookups.)

While a seemingly small feature this new inherited lookup of templates allows you to reuse whole parts of common view logic without resorting to using an amalgamation of partials to accomplish the same thing.

The Skinny on Scopes (Formerly named_scope)

posted: February 23rd, 2010 by: Ryan Daigle

Rails 3.0

The source for the examples contained in this article are located at: http://github.com/rwdaigle/edgerails-support/tree/master/the-skinny-on-scopes-formerly-named-scope/

I remember my heart fluttering with a boyish crush the first time I saw Nick Kallen’s has_finder functionality make it into Rails in the form of named_scope. named_scope quickly made its way into my toolset as a great way to encapsulate reusable, chainable bits of query logic. While it had its downsides (namely its lack of first-class chain support for the likes of :joins and :include) it redefined how I thought about structuring my model logic. Once you taste the chainable goodness of named_scope you never go back.

So here we are with Rails 3 completely refactoring the internals of ActiveRecord - what’s up with our beloved named_scope? Well, the simple answer is that it’s been renamed to scope and you can use it just as you’re used to … but that’s taking the easy way out. Let’s see what else we can do with scope in Rails 3.

Basic Usage

Let’s assume a standard Post model with published_at datetime field along with title and body (to follow along in code, see the accompanying project in github).

In Rails 2.x here’s how we’d have to define the self-explanatory published and recent named scopes:

Ruby - post.rb
class Post < ActiveRecord::Base

  named_scope :published, lambda { 
    { :conditions =>
      ["posts.published_at IS NOT NULL AND posts.published_at <= ?", Time.zone.now]
    }
  }  
  named_scope :recent, :order => "posts.published_at DESC"
end

The reason we need to use a lambda here is that it delays the evaluation of the Time.zone.now argument to when the scope is actually invoked. Without the lambda the time that would be used in the query logic would be the time that the class was first evaluated, not the scope itself.

With Rails 3 the bulk of ActiveRecord is now based on the Relation class. Think of relation as named_scope on steroids, weaving chainable query logic into the very fabric of ActiveRecord.

You can see how to use the individual where, order etc… commands of Relation on Pratik’s great writeup of the new query interface as well in this Railscast. Understanding these are important as the new scope is built upon them.

Let’s see how - here’s how the two named scopes from our previous Post example will look in Rails 3:

Ruby - post.rb
class Post < ActiveRecord::Base

  scope :published, lambda { 
    where("posts.published_at IS NOT NULL AND posts.published_at <= ?", Time.zone.now)
  }  
  scope :recent, order("posts.published_at DESC")
end

While the bulk of the logic is the same - the SQL string portions - you start to see how scopes use the new query interface directly to create reusable query logic versus constructing an options hash as was done in Rails 2. This really is our first glimpse of how much more flexible the new query interface allows our scopes to be. No longer are they a slightly different construct than your normal query methods. They are now built upon the very same query methods that you would use were you to execute a query directly. This consistency is now prevalent all throughout ActiveRecord.

And there’s more…

Scope Reusability

Suppose we want to update our recent scope to only include published posts. We’ve already defined what published means and shouldn’t have to redefine it to create a new scope. Well, you can also chain scopes within scope definitions themselves as we’ll do here with the new recent and published_since scopes.

Ruby - post.rb
class Post < ActiveRecord::Base
  
  scope :published, lambda { 
    where("posts.published_at IS NOT NULL AND posts.published_at <= ?", Time.zone.now)
  }
  scope :published_since, lambda { |ago|
    published.where("posts.published_at >= ?", ago)
  }
  
  scope :recent, published.order("posts.published_at DESC")
end

Ok, now we’re getting warmed up.

Dynamic Scope Construction

I’ve been in love with the scoped the anonymous named_scope constructor in Rails 2.3 for sometime now, using it to create dynamic and chainable scopes on an as-needed basis. One use-case you see a lot for this type of functionality is for creating a search method that you can still append other query manipulations onto.

For instance, to search our posts we can create this method which will return a scope for your caller to further filter (notice the use of scoped to start the chain off with an innocuous scope upon which others can be appended):

Ruby - post.rb
class Post < ActiveRecord::Base
  
  class << self
    
    # Search the title and body fields for the given string.
    # Start with an empty scope and build on it for each attr.
    # (Allows for easy extraction of searchable fields definition
    # in the future)
    def search(q)
      [:title, :body].inject(scoped) do |combined_scope, attr|
        combined_scope.where("posts.#{attr} LIKE ?", "%#{q}%")
      end
    end
  end
end

The use of inject here somewhat obfuscates the intent of the method if you’re not used to looking at such iterations - here’s an easier to follow version with the searchable fields more hard coded (which actually doesn’t even use an anonymous scope to get bootstrapped):

Ruby - post.rb
class Post < ActiveRecord::Base
  
  class << self
    
    # The less-slick but, perhaps, more obvious version
    def search(q)
      query = "%#{q}%"
      where("posts.title LIKE ?", query).where("posts.body LIKE ?", query)
    end
  end
end

Since we’re building upon the chainable goodness of then new query interface (think of scopes now as named bundles of the new ActiveRelation construct), we can do the following with the search method:

Ruby - irb session
# What's in the db, titles ~= publish date
Post.all.collect(&:title) #=> ["1 week from now", "Now", "1 week ago", "2 weeks ago"]
Post.published.collect(&:title) #=> ["Now", "1 week ago", "2 weeks ago"]

# Search combinations
Post.search('1').collect(&:title) #=> ["1 week from now", "1 week ago"]
Post.search('1').published.collect(&:title) #=> ["1 week ago"]
Post.search('w').published_since(10.days.ago).collect(&:title) #=> ["Now", "1 week ago"]
Post.search('w').order('created_at DESC').limit(2).collect(&:title) #=> ["2 weeks ago", "1 week ago"]

You can imagine a scenario where more complex query-string support could be built, all using anonymous scopes.

Feels great, huh? It also feels a lot like the utility_scopes gem I released awhile ago which was my attempt to package up the chainable goodness of named_scope for common query operations. Rest-assured, there’s a much smoother implementation under the covers here in Rails 3 than just some hacks on top of named_scope

Cross-Model Scopes

Scopes are great for operating solely on the columns of a singular class’s table, but they can also be used to package cross-model queries (i.e. any SQL that would require a join). Let’s add in users (who can author and comment on posts) to the mix and write some scopes on User that will fetch only users that have authored published posts as well as users that have commented on a post:

Ruby - user.rb
class User < ActiveRecord::Base

  has_many :posts, :foreign_key => :author_id
  has_many :comments
  
  # Get all users that have published a post
  scope :published, lambda {
    joins("join posts on posts.author_id = users.id").
    where("posts.published_at IS NOT NULL AND posts.published_at <= ?", Time.zone.now).
    group("users.id")
  }
  
  # Get all users that have commented on a post
  scope :commented, joins("join comments on comments.user_id = users.id").group("users.id")
end

Also, as Steffen pointed out in the comments, ActiveRelation is smart enough to know how to do a join based on an association definition, allowing us to collapse the joins relations from SQL strings to an association reference:

Ruby - user.rb
class User < ActiveRecord::Base
  
  # Get all users that have published a post
  scope :published, lambda {
    joins(:posts).              # No need to write your own SQL
    where("posts.published_at IS NOT NULL AND posts.published_at <= ?", Time.zone.now).
    group("users.id")
  }
  
  # Get all users that have commented on a post
  scope :commented, joins(:comments).group("users.id") # Just reference :comments
end

It’s a good practice to always refer to the full table_name.column_name when building scopes versus just the column_name itself (i.e.: posts.published_at vs. just published_at in the example above). This allows for unambiguous column references - especially important when building cross-model scopes where columns from more than one table are joined.

To be extra-flexible you can always invoke table_name in place of the hard-coded table name, though. to confess, this is a step I rarely take the time to implement myself: where("#{table_name}.published_at IS NOT NULL")

Since we’ve got the full arsenal of ActiveRelation operators at our disposal in scopes, we can do joins and group bys within scopes that will be safely chained in complex queries - something where the old named_scope crapped the bed:

Ruby - irb session
# Get all users that have a post published
User.published.collect(&:username) #=> ["tim", "dave"]
User.published.to_sql
  #=> SELECT "users".* FROM "users" join posts on posts.author_id = users.id
  #   WHERE (posts.published_at IS NOT NULL AND posts.published_at <= '2010-02-22 21:33:00.892308')
  #   GROUP BY users.id
  
# Get all users that have commented on a post
User.commented.collect(&:username) #=> ["ryan", "john", "tim", "dave"]
User.commented.to_sql
  #=> SELECT "users".* FROM "users" join comments on comments.user_id = users.id
  #   GROUP BY users.id
  
# Combine them to get all authors that have also commented
User.published.commented.collect(&:username) #=> ["tim", "dave"]
User.published.commented.to_sql
  #=> SELECT "users".* FROM "users"
  #   join posts on posts.author_id = users.id
  #   join comments on comments.user_id = users.id
  #   WHERE (posts.published_at IS NOT NULL AND posts.published_at <= '2010-02-22 21:33:00.892308')
  #   GROUP BY users.id

As I’ve done here, use scope#to_sql to peek at what SQL the scope will execute. Very useful for debugging purposes.

Scope-based Model CRUD

Since ActiveRelation lets you invoke all the builder/update/destroy methods on a relation that you’re used to using directly against your models, that power is also available at the end of a scope/scope-chain. Let’s play around with our post scopes and use them to do more than just query:

Ruby - irb session
# Increment the views_count for all published posts
Post.published.collect(&:views_count) #=> [59, 71, 42]
Post.published.update_all("views_count = views_count + 1")
Post.published.collect(&:views_count) #=> [60, 72, 43]

# Nobody cares about unpublished posts
Post.unpublished.size #=> 1
Post.unpublished.destroy_all
Post.unpublished.size #=> 0

You can also create a new model from existing scopes - suppose we have a (very contrived) scoped that gets only posts of certain title:

Ruby - post.rb
class Post < ActiveRecord::Base
  
  # Ludacris
  scope :titled_luda, where(:title => 'Luda')
end

We can use this scope to directly build instances (as well as create, new, create! etc…):

Ruby - irb session
Post.titled_luda.size #=> 0
Post.titled_luda.build
  #=> #<Post id: nil, title: "Luda", ...>

In order to use the creation/builder methods on a scope, the scope should directly define attribute equality using a `where` relation and the hash form of the attribute values, as was done above.

Specifying where("title = 'Luda'") would not have propagated the attribute values to newly constructed instances.

Scopes really can be thought of now as named packages of both query and construction logic. Very powerful.

Crazy Town

One thing that’s always bugged me is how the logic for what makes a Post published is split between scopes in both the Post class and the User class. To refresh our collective memories:

Ruby - post.rb
class Post < ActiveRecord::Base
  
  scope :published, lambda { 
    where("posts.published_at IS NOT NULL AND posts.published_at <= ?", Time.zone.now)
  }
end

And:

Ruby - user.rb
class User < ActiveRecord::Base
  
  scope :published, lambda {
    joins(:posts).
    where("posts.published_at IS NOT NULL AND posts.published_at <= ?", Time.zone.now).
    group("users.id")
  }
end

Most good developers will immediately cringe at the duplication of the where("posts.published_at IS NOT NULL AND posts.published_at <= ?", Time.zone.now) relation.

Thanks to a tip by Railscast’s Ryan Bates, there’s a pretty slick way to refer to, and combine, scope logic: the merge method, aliased as ‘&’. Let’s look at how we can use scope#& to refer to the query logic of the Post.published scope from within our User.published scope:

Ruby - user.rb
class User < ActiveRecord::Base
  
  scope :published, lambda {
    joins(:posts).group("users.id") & Post.published
  }
end

Just so we’re clear what happens when you merge relations/scopes with the & operator, let’s look at the resulting SQL:

Ruby - irb session
User.published.to_sql
  #=> SELECT users.* FROM "users"
  #   INNER JOIN "posts" ON "posts"."author_id" = "users"."id"
  #   WHERE (posts.published_at IS NOT NULL AND posts.published_at <= '2010-02-27 02:55:45.063181')
  #   GROUP BY users.id

Notice how the conditions defined within Post.published are merged into the joins and group relations of the User.published scope? Very nice. And merging works with all the mergeable relations, not just where conditions we used here.

Summary

This post somewhat glosses over the new query interface for ActiveRecord in Rails 3 to get to the meat of using scopes. However, none of the scoped yumminess could have happened without the slick new underpinnings of ActiveRecord. So, if you’re still a little confused about all this, definitely read some more about ActiveRecord before jumping into scopes. Once you do have that foundation, however, you will use scopes on a very regular basis.

The following resources were instrumental in the research, creation and construction of this article. They may also provide a different angle should you be left wanting after reading this post:

Notes From The First Rails Online Conference

posted: February 18th, 2010 by: Ryan Daigle

Rails 3.0

Earlier today the inaugural Rails Online Conf occurred and, Webex being a bag of hurt aside, it was a great experience. Basically, we got a high-level rundown of most of the big changes (internal and external) in Rails 3 from an all-star lineup of Railstuds. The slides are up on the site but if you want to save a single click-through, here they are again:

Don’t forget to check out Jeremy’s upcoming Rails 3 Upgrade Handbook if you’re in the market for test driving all the sweetness Rails 3 has to offer. Jeremy’s a great writer and this 120-page manual is sure to impress.

The RailsConf program committee of Chad and Ben deserves a big “thanks” for putting on the online conf. My hope is that mini-events like this become more frequent in the future.

If you missed out on this event and aren’t a fan of visuals, you can always download the audio recording when it comes out.