Posterous theme by Cory Watilo

CouchDB 0.8.1 Simple One-Way Replication

CouchDB 0.8.1 has been available for a few weeks, so I decided to see how it was coming along. For no real reason at all, I wanted to get a very simple replication process going between two CouchDB databases hosted on different machines. When I say "simple", I mean "don't get all excited because this is not a post about replicating all the documents reviewed in a Big Tobacco lawsuit!" Not that you would get excited about that either. I just wanted to see a few documents show up in another database on another machine. Simple enough. The CouchDB website, wiki, and mailing list archives were actually pretty good for getting started.

Easy Install

Thankfully, CouchDB has some pretty good installation information on its wiki and in its source distribution README file. In a short amount of time, I had a working CouchDB server on Ubuntu and OS X.

Ubuntu Hardy Heron 8.04 build and install

The CouchDB wiki page for Ubuntu contains links to the two following blog posts. I followed the Barking Iguana instructions. http://barkingiguana.com/2008/06/28/installing-couchdb-080-on-ubuntu-804 http://www.chetanmittal.com/2008/6/15/install-couchdb-on-ubuntu-hardy-heron-8-04 Notes: The Barking Iguana update-rc.d step and the init.d copy step should be switched. Also, I would add the couchdb user as a system user.  The CouchDB source distribution README file has this step in its instructions, but the home directory should be /usr/local/var/lib/couchdb. sudo adduser --system --home /usr/local/var/lib/couchdb --no-create-home \ --shell /bin/bash --group --gecos "CouchDB Administrator" couchdb To allow CouchDB to listen for external requests, modify the BindAddress in the /usr/local/etc/couchdb/couch.ini file, and update any firewall settings as needed: ;original ;BindAddress=127.0.0.1 ;modified BindAddress=0.0.0.0 Bind the CouchDB server to an externally available IP address or to 0.0.0.0.

OS X Leopard binary

http://jan.prima.de/~jan/plok/archives/142-CouchDBX-Revival.html Thanks to Jan L, one of the Apache CouchDB committers, CouchDBX is the easiest way to get a CouchDB server running on OS X. To allow CouchDB to listen for external requests, view the CouchDBX.app package contents, edit the /Contents/Resources/CouchDb/etc/couchdb/couch.ini file, and update the OS X firewall as needed.

Replicating 3 Documents

CouchDB comes with the Futon Utility Client, a browser-based user interface to setup databases, add/update/delete documents, run a CouchDB test suite, and manually initiate replication through its "Replicator" page. The Replicator tool offers replication from a source to target database, hosted locally or remotely available through HTTP. Pretty straightforward to use. Strangely, error messages are displayed in JSON format through a JavaScript alert, but it's not a big deal. So, at this point, I had two CouchDB servers on two different machines. I created a database on each server by using the Futon client and used the same name for both databases. I then proceeded to add 3 documents with randomly added fields and values into my designated "source" database. Now that I had servers, databases, and documents, it was time to figure out how to replicate the database automatically... Lo-and-behold, I found out that CouchDB currently doesn't offer "automated" replication without writing some code. Ok, fair enough. To get things started, I modified the couch.ini file by adding a line for DbUpdateNotificationProcess. More information can be found on the CouchDB wiki page for updating document views. DbUpdateNotificationProcess=/usr/local/var/lib/couchdb/potatoe.rb Why is the information on a page about updating document views? Well, the notification hook can be used to update views before a user actually queries those views. The page contains an example Ruby script that does just that. In addition, the same notification hook can be used for kicking off full-text indexing. But I had simpler (useless) needs. CouchDB will take care of starting the process and will output (stdout) a short JSON message to notify the process each time a local database is updated. Make sure that the couchdb user has the proper privileges to run the script/executable. [sourcecode="js"] # example database update notification JSON message {"type":"updated","db":"mytestdb"} [/sourcecode]Anyway, here's the (simple || stupid) script that got one-way replication going between two databases containing 3 documents hosted on 2 different machines sitting 3 feet apart. It uses the RestClient gem. [sourcecode="ruby"] #!/usr/bin/ruby require 'rubygems' require 'logger' require 'json' # sudo gem install json #require 'json/pure' # sudo gem install json_pure require 'rest_client' # sudo gem install rest-client logger = Logger.new('/usr/local/var/log/couchdb/potatoe.log', 3, 1024000) logger.level = Logger::INFO REPLICATE = "http://192.168.0.4:5984/_replicate" SOURCE = "http://192.168.0.4:5984/mytestdb" TARGET = "http://192.168.0.2:5984/mytestdb" begin logger.info "Ready for CouchDB..." replicate = RestClient::Resource.new REPLICATE replicationMsg = {:source => SOURCE, :target => TARGET}.to_json loop do unless (jsonOut = gets).nil? logger.debug jsonOut message = JSON.parse jsonOut if message["type"] == "updated" and message["db"] == "mytestdb" logger.info "'#{message['db']}' database updated." logger.info "Replicating..." response = replicate.post replicationMsg, :content_type => 'application/json' logger.debug response results = JSON.parse response if results["ok"] logger.info "Replication succeeded. " + "session_id: #{results['session_id']} " + "source_last_seq: #{results['source_last_seq']}" else # Currently, CouchDB 0.8.1 doesn't work this way. # It returns an HTTP 500 error instead of false. logger.info "Replication error: #{results}" end end else logger.info "CouchDB has gone away..." break end end rescue Exception => e logger.error "Error message: #{e.message}" logger.error "Stack trace: #{e.backtrace.inspect}" ensure logger.close end [/sourcecode]Basically, the script makes sure that it cares about the database that was updated and POSTs a JSON string to the CouchDB server's _replicate "resource" (I know...calling it a resource doesn't make sense). This initiates the replication process between the SOURCE and TARGET databases. The script doesn't stagger the updates -- it starts the replication process after every database update. This means that replication will occur each time a document is updated in the source database. And it doesn't do anything if the target database is unavailable...fault-tolerance-shmolerance! With the script in place, permissions all set, and the CouchDB server restarted, all I had to do was update a document in the source database to initiate the replication process.

That's it.

Yes, that's it. Three documents were replicated on the other machine. Seriously, it was everything I had hoped for, and more. :-) I'm sure other people are going to replicate millions of documents across thousands of CouchDB servers and databases on all types of hardware and networks. With documents containing a ridiculous number of fields and content. And maybe using another CouchDB database to store a list of databases to replicate, and replicating that database as well. And obviously all in Erlang too...but not me, at least not today. Notes:
  • The _replicate response message contains the "ok" result as well as a history of replication events. I found out through the CouchDB mailing that the response will contain information about the last 50 replication events. However, I noticed that the returned events were unique to the client that initiated the replication. I didn't see the same history when I kicked off replication through a Ruby irb session, the DbUpdateNotificationProcess script, or through the Futon client.
  • You can also do a hot-backup copy of the database files if you don't need "replication" like this and if your database files are small.

Update

I placed a JRuby script on GitHub that does batch and timed replications: Jpotatoe: A simple JRuby script for one-way CouchDB 0.8.1 replication Feedback is welcome.

by

| Viewed
times | Favorited 0 times
Filed under:          

11 Comments

Sep 08, 2008
Jan said...
Way cool! Thanks for sharing! Feel free to add anything here to the CouchDB wiki :)

One note: You usually don't want to trigger replication for single document changes but have it work in batches, rather. So queueing up 10 or so events before sending the trigger might be a good idea. Now, what to do when there are 9 events and then none for an hour. The replicated-to server will be out of sync for a while. If that's no big deal, ok, if it is, add another metric to the soup, a timeout, that triggers replication after X seconds or minutes, regardless of how many events came through.

Cheers
Jan
--

Sep 08, 2008
Noah Slater said...
Nice article, thanks for the writeup.

Good catch with the 'adduser' command in the README, I have fixed that now.

Sep 08, 2008
robertor said...
Thanks for the note Jan. I agree with you; the script doesn't do batch-based replication and doesn't have any logic to properly handle failed replications. Both would be pretty important in a "real" situation. :-)
Sep 17, 2008
gigdoggy said...
Really useful post, thank you very much for taking the time to do this.
Sep 17, 2008
gigdoggy said...
While playing around with your code, I noticed that I was getting HTTP 500's, caused by the "source" parameter in the REST post: changing it to "myDB" instead of "http://localhost:5984/myDB" fixed it.
Again, thanks for this great tutorial.
Sep 17, 2008
robertor said...
@gigdoggy That's odd...but I'm glad it worked for you! Thanks for the comments.
Sep 17, 2008
robertor said...
Oh, note that in the Subversion trunk, the .ini file name has changed to local.ini and the ini setting has also changed to be:

[update_notification]
;unique notifier name=/full/path/to/exe -with "cmd line arg"

Oct 12, 2008
CouchDB resources | Tech Mix said...
[...] on Rails (nice one, detailed, step-by-step, but I’m not a rail developer, may be later on) Simple one way replication [...]
Nov 16, 2008
Matias Quaglia said...
Really good post.

Thanks.

I was stuck installing couchDB until I read it.

Really helpful!

Dec 07, 2008
Dave Johnson said...
Great post! It turned out to be just what I needed to get started!

I have been meaning to check out CouchDB for a while and always wondering just how exactly the replication stuff got implemented.

May 10, 2010
Jan-Piet Mens » Update notifications in CouchDB: tweeting urgent documents said...
[...] database name and go and look for what has changed. Or for example, the program could automatically launch replication, if you don't want to use continuous_replication as exists in CouchDB [...]

Leave a comment...