rewrite > blog.idearise.com

the unofficial idearise blog

CouchDB 0.8.1 Simple One-Way Replication

with 10 comments

CouchDB 0.8.1 has been available for a few weeks, so I decided to see how it was coming along. For no real reason at all, I wanted to get a very simple replication process going between two CouchDB databases hosted on different machines. When I say “simple”, I mean “don’t get all excited because this is not a post about replicating all the documents reviewed in a Big Tobacco lawsuit!” Not that you would get excited about that either. I just wanted to see a few documents show up in another database on another machine. Simple enough.

The CouchDB website, wiki, and mailing list archives were actually pretty good for getting started.

Easy Install

Thankfully, CouchDB has some pretty good installation information on its wiki and in its source distribution README file. In a short amount of time, I had a working CouchDB server on Ubuntu and OS X.

Ubuntu Hardy Heron 8.04 build and install

The CouchDB wiki page for Ubuntu contains links to the two following blog posts. I followed the Barking Iguana instructions.

http://barkingiguana.com/2008/06/28/installing-couchdb-080-on-ubuntu-804
http://www.chetanmittal.com/2008/6/15/install-couchdb-on-ubuntu-hardy-heron-8-04

Notes:
The Barking Iguana update-rc.d step and the init.d copy step should be switched. Also, I would add the couchdb user as a system user.  The CouchDB source distribution README file has this step in its instructions, but the home directory should be /usr/local/var/lib/couchdb.

sudo adduser --system --home /usr/local/var/lib/couchdb --no-create-home \
--shell /bin/bash --group --gecos "CouchDB Administrator" couchdb

To allow CouchDB to listen for external requests, modify the BindAddress in the /usr/local/etc/couchdb/couch.ini file, and update any firewall settings as needed:

;original
;BindAddress=127.0.0.1
;modified
BindAddress=0.0.0.0

Bind the CouchDB server to an externally available IP address or to 0.0.0.0.

OS X Leopard binary

http://jan.prima.de/~jan/plok/archives/142-CouchDBX-Revival.html

Thanks to Jan L, one of the Apache CouchDB committers, CouchDBX is the easiest way to get a CouchDB server running on OS X. To allow CouchDB to listen for external requests, view the CouchDBX.app package contents, edit the /Contents/Resources/CouchDb/etc/couchdb/couch.ini file, and update the OS X firewall as needed.

Replicating 3 Documents

CouchDB comes with the Futon Utility Client, a browser-based user interface to setup databases, add/update/delete documents, run a CouchDB test suite, and manually initiate replication through its “Replicator” page. The Replicator tool offers replication from a source to target database, hosted locally or remotely available through HTTP. Pretty straightforward to use. Strangely, error messages are displayed in JSON format through a JavaScript alert, but it’s not a big deal.

So, at this point, I had two CouchDB servers on two different machines. I created a database on each server by using the Futon client and used the same name for both databases. I then proceeded to add 3 documents with randomly added fields and values into my designated “source” database. Now that I had servers, databases, and documents, it was time to figure out how to replicate the database automatically…

Lo-and-behold, I found out that CouchDB currently doesn’t offer “automated” replication without writing some code. Ok, fair enough. To get things started, I modified the couch.ini file by adding a line for DbUpdateNotificationProcess. More information can be found on the CouchDB wiki page for updating document views.

DbUpdateNotificationProcess=/usr/local/var/lib/couchdb/potatoe.rb

Why is the information on a page about updating document views? Well, the notification hook can be used to update views before a user actually queries those views. The page contains an example Ruby script that does just that. In addition, the same notification hook can be used for kicking off full-text indexing. But I had simpler (useless) needs.

CouchDB will take care of starting the process and will output (stdout) a short JSON message to notify the process each time a local database is updated. Make sure that the couchdb user has the proper privileges to run the script/executable.


# example database update notification JSON message
{"type":"updated","db":"mytestdb"}

Anyway, here’s the (simple || stupid) script that got one-way replication going between two databases containing 3 documents hosted on 2 different machines sitting 3 feet apart. It uses the RestClient gem.


#!/usr/bin/ruby

require 'rubygems'
require 'logger'
require 'json' # sudo gem install json
#require 'json/pure' # sudo gem install json_pure
require 'rest_client' # sudo gem install rest-client

logger = Logger.new('/usr/local/var/log/couchdb/potatoe.log', 3, 1024000)
logger.level = Logger::INFO

REPLICATE = "http://192.168.0.4:5984/_replicate"
SOURCE = "http://192.168.0.4:5984/mytestdb"
TARGET = "http://192.168.0.2:5984/mytestdb"

begin

  logger.info "Ready for CouchDB..."
  replicate = RestClient::Resource.new REPLICATE
  replicationMsg = {:source => SOURCE, :target => TARGET}.to_json

  loop do
    unless (jsonOut = gets).nil?
      logger.debug jsonOut
      message = JSON.parse jsonOut

      if message["type"] == "updated" and message["db"] == "mytestdb"
        logger.info "'#{message['db']}' database updated."
        logger.info "Replicating..."
        response = replicate.post replicationMsg,
                                  :content_type => 'application/json'
        logger.debug response
        results = JSON.parse response
        if results["ok"]
          logger.info "Replication succeeded. " +
                      "session_id: #{results['session_id']} " +
                      "source_last_seq: #{results['source_last_seq']}"
        else
          # Currently, CouchDB 0.8.1 doesn't work this way.
          # It returns an HTTP 500 error instead of false.
          logger.info "Replication error: #{results}"
        end

      end
    else
      logger.info "CouchDB has gone away..."
      break
    end
  end
rescue Exception => e
  logger.error "Error message: #{e.message}"
  logger.error "Stack trace: #{e.backtrace.inspect}"
ensure
  logger.close
end

Basically, the script makes sure that it cares about the database that was updated and POSTs a JSON string to the CouchDB server’s _replicate “resource” (I know…calling it a resource doesn’t make sense). This initiates the replication process between the SOURCE and TARGET databases. The script doesn’t stagger the updates — it starts the replication process after every database update. This means that replication will occur each time a document is updated in the source database. And it doesn’t do anything if the target database is unavailable…fault-tolerance-shmolerance!

With the script in place, permissions all set, and the CouchDB server restarted, all I had to do was update a document in the source database to initiate the replication process.

That’s it.

Yes, that’s it. Three documents were replicated on the other machine. Seriously, it was everything I had hoped for, and more. :-)

I’m sure other people are going to replicate millions of documents across thousands of CouchDB servers and databases on all types of hardware and networks. With documents containing a ridiculous number of fields and content. And maybe using another CouchDB database to store a list of databases to replicate, and replicating that database as well. And obviously all in Erlang too…but not me, at least not today.

Notes:

  • The _replicate response message contains the “ok” result as well as a history of replication events. I found out through the CouchDB mailing that the response will contain information about the last 50 replication events. However, I noticed that the returned events were unique to the client that initiated the replication. I didn’t see the same history when I kicked off replication through a Ruby irb session, the DbUpdateNotificationProcess script, or through the Futon client.
  • You can also do a hot-backup copy of the database files if you don’t need “replication” like this and if your database files are small.

Update

I placed a JRuby script on GitHub that does batch and timed replications:
Jpotatoe: A simple JRuby script for one-way CouchDB 0.8.1 replication

Feedback is welcome.

Written by robertor

2008 September 8 at 12:08 am

Posted in play, read, work

Tagged with ,

10 Responses

Subscribe to comments with RSS.

  1. Way cool! Thanks for sharing! Feel free to add anything here to the CouchDB wiki :)

    One note: You usually don’t want to trigger replication for single document changes but have it work in batches, rather. So queueing up 10 or so events before sending the trigger might be a good idea. Now, what to do when there are 9 events and then none for an hour. The replicated-to server will be out of sync for a while. If that’s no big deal, ok, if it is, add another metric to the soup, a timeout, that triggers replication after X seconds or minutes, regardless of how many events came through.

    Cheers
    Jan

    Jan

    2008 September 8 at 7:06 am

  2. Nice article, thanks for the writeup.

    Good catch with the ‘adduser’ command in the README, I have fixed that now.

    Noah Slater

    2008 September 8 at 4:11 pm

  3. Thanks for the note Jan. I agree with you; the script doesn’t do batch-based replication and doesn’t have any logic to properly handle failed replications. Both would be pretty important in a “real” situation. :-)

    robertor

    2008 September 8 at 4:18 pm

  4. Really useful post, thank you very much for taking the time to do this.

    gigdoggy

    2008 September 17 at 2:24 pm

  5. While playing around with your code, I noticed that I was getting HTTP 500’s, caused by the “source” parameter in the REST post: changing it to “myDB” instead of “http://localhost:5984/myDB” fixed it.
    Again, thanks for this great tutorial.

    gigdoggy

    2008 September 17 at 3:01 pm

  6. @gigdoggy That’s odd…but I’m glad it worked for you! Thanks for the comments.

    robertor

    2008 September 17 at 3:27 pm

  7. Oh, note that in the Subversion trunk, the .ini file name has changed to local.ini and the ini setting has also changed to be:

    [update_notification]
    ;unique notifier name=/full/path/to/exe -with “cmd line arg”

    robertor

    2008 September 17 at 3:29 pm

  8. [...] on Rails (nice one, detailed, step-by-step, but I’m not a rail developer, may be later on) Simple one way replication [...]

  9. Really good post.

    Thanks.

    I was stuck installing couchDB until I read it.

    Really helpful!

    Matias Quaglia

    2008 November 16 at 1:22 pm

  10. Great post! It turned out to be just what I needed to get started!

    I have been meaning to check out CouchDB for a while and always wondering just how exactly the replication stuff got implemented.

    Dave Johnson

    2008 December 7 at 11:11 pm


Leave a Reply