Zero Downtime Deploys for Rails
On WebPedro Belo at Railsconf 2012Ruby on Rails-focused, but equally applicable to webapps in JavaScript/Node.js and beyond. Describes techniques to achieve deployments that do not require downtime, even for those that include database schema changes. While downtime for database changes in my view can be excused, the HTML-compatibility sections are very applicable to JavaScript heavy sites that may run for extended periods in the client's browser. Some technical suggested solutions are irrelevant in 2018, but overall timeless.
Database Compatibility
Consider removing a model attribute and its related database column. Along with the necessary code changes, you create a database migration:
class ApparentlyHarmlessMigration < ActiveRecord::Migration
def self.up
remove_column :users, :notes
end
end
Upon deploying, however, you're likely to see your existing webapp instances raise database errors:
PGError: ERROR: column "notes" does not exist
Ruby on Rails' ActiveRecord, as any other ORM, will cache columns for performance in the production environment. App instances servicing requests that started before your deployment therefore still reference the removed notes
column. Note this is likely the case even if you're not using a full ORM and are passing plain objects to a query library (akin to query.table("users").insert(user)
) as those objects probably still refer to non-existent columns.
Pedro proposes approaching this problem through hot compatibility --- ensuring every two consecutive deployments be compatible with each other and able to run in parallel. For example, removing a column requires 2 deployments:
-
Write to
notes
. -
Stop referencing
notes
while leaving it in the database:class User def self.columns super.reject {|column| column.name == "notes" } end end
-
Remove
notes
from the database.
For renaming columns, Pedro suggests temporarily writing to both columns, requiring a total of 3 deployments:
- Read and write to
notes
. - Add column
remarks
. Read fromnotes
, write tonotes
andremarks
. - Populate
remarks
where needed; read and write only toremarks
. - Remove
notes
.
HTML Compatibility
Beyond model and database mismatches, you're likely to see problems with HTML forms and API requests that were submitted from a page rendered by a previous version of the app. Take a form on the signin page with a renamed field for example:
<form method="post" action="/session">
- <input name="username">
+ <input name="email">
<button>Sign in</button>
</form>
In a fully server-side rendered app perhaps not that problematic --- submitting it would merely inform the visitor of not having filled their email field while presenting them with an updated form1. A JavaScript-heavy signin form (or a single-page app), on the other, is bound to just break entirely. In both server- and client-side approaches there's a high risk of losing the person's submitted information and that's a cardinal sin of user experience.
A solution is to be backwards compatible with old requests and migrate parameters in the controller:
class AuthController < ApplicationController
def filtered_params
params.dup.tap do |params|
params.merge!(email: params.delete(:username))
end
end
end
There's also a potential problem with page assets. If you're removing or renaming CSS stylesheets or JavaScript files, it's possible a person loading your page in the middle of a migration will reference assets you've removed in the latest deployment. Functional and visual anomalies can also happen if you change stylesheets or JavaScripts in ways that are incompatible with the HTML of the previous deployment. Pedro proposed versioning assets and keeping older assets around for some time.
Migration Strategies
Once you get your app's consecutive versions compatible, it's important to ensure your deployment process doesn't prevent existing app instances from servicing requests. It's easy to accidentally lock entire database tables through migrations.
For PostgreSQL, the locking implications are:
Operation | Performance |
---|---|
ADD COLUMN | O(n) lock if you set a default value (writes to every row). O(1) otherwise. |
ALTER COLUMN | O(N) lock. |
REMOVE COLUMN | O(1) |
CREATE INDEX | O(N) lock unless you create via CREATE INDEX CONCURRENTLY . |
DELETE INDEX | O(1) |
To minimize the time of holding a lock on the entire table when adding a column with a default value, split the operation to two parts. One that adds a column with no default value and the other that sets it:
ALTER TABLE users ADD COLUMN notes TEXT;
ALTER TABLE users ALTER COLUMN notes SET DEFAULT "";
Note that this will leave existing rows and the column itself nullable. You should probably do batched updates (e.g. setting a few thousand rows at a time) at some point to fill in the missing values and then alter the column to NOT NULL
.
PostgreSQL 11 is said to provide fast column additions even with default values.
Server
For being able to do zero downtime deployments, you'll also need a web server that can start up new instances and gracefully shut down old ones while leaving existing running requests to finish on their own.
This can be handled "above the stack" by an external load balancer (Nginx, HAProxy or Heroku's platform) in front of your Ruby webserver that optionally coordinates with your app when to stop routing requests. Alternatively, you can go with Unicorn that can also handle graceful restarts.
- Although imagine the resulting frustration if that message instructed to click the browser's back button and try again instead, only to have the error persist.↩