CouchDB 0.8.1 Simple One-Way Replication
CouchDB 0.8.1 has been available for a few weeks, so I decided to see how it was coming along. For no real reason at all, I wanted to get a very simple replication process going between two CouchDB databases hosted on different machines. When I say "simple", I mean "don't get all excited because this is not a post about replicating all the documents reviewed in a Big Tobacco lawsuit!" Not that you would get excited about that either. I just wanted to see a few documents show up in another database on another machine. Simple enough.
The CouchDB website, wiki, and mailing list archives were actually pretty good for getting started.
Easy Install
Thankfully, CouchDB has some pretty good installation information on its wiki and in its source distributionREADME file. In a short amount of time, I had a working CouchDB server on Ubuntu and OS X.
Ubuntu Hardy Heron 8.04 build and install
The CouchDB wiki page for Ubuntu contains links to the two following blog posts. I followed the Barking Iguana instructions. http://barkingiguana.com/2008/06/28/installing-couchdb-080-on-ubuntu-804 http://www.chetanmittal.com/2008/6/15/install-couchdb-on-ubuntu-hardy-heron-8-04 Notes: The Barking Iguanaupdate-rc.d step and the init.d copy step should be switched. Also, I would add the couchdb user as a system user. The CouchDB source distribution README file has this step in its instructions, but the home directory should be /usr/local/var/lib/couchdb.
sudo adduser --system --home /usr/local/var/lib/couchdb --no-create-home \
--shell /bin/bash --group --gecos "CouchDB Administrator" couchdb
To allow CouchDB to listen for external requests, modify the BindAddress in the /usr/local/etc/couchdb/couch.ini file, and update any firewall settings as needed:
;original
;BindAddress=127.0.0.1
;modified
BindAddress=0.0.0.0
Bind the CouchDB server to an externally available IP address or to 0.0.0.0.
OS X Leopard binary
http://jan.prima.de/~jan/plok/archives/142-CouchDBX-Revival.html Thanks to Jan L, one of the Apache CouchDB committers, CouchDBX is the easiest way to get a CouchDB server running on OS X. To allow CouchDB to listen for external requests, view theCouchDBX.app package contents, edit the /Contents/Resources/CouchDb/etc/couchdb/couch.ini file, and update the OS X firewall as needed.
Replicating 3 Documents
CouchDB comes with the Futon Utility Client, a browser-based user interface to setup databases, add/update/delete documents, run a CouchDB test suite, and manually initiate replication through its "Replicator" page. The Replicator tool offers replication from a source to target database, hosted locally or remotely available through HTTP. Pretty straightforward to use. Strangely, error messages are displayed in JSON format through a JavaScript alert, but it's not a big deal. So, at this point, I had two CouchDB servers on two different machines. I created a database on each server by using the Futon client and used the same name for both databases. I then proceeded to add 3 documents with randomly added fields and values into my designated "source" database. Now that I had servers, databases, and documents, it was time to figure out how to replicate the database automatically... Lo-and-behold, I found out that CouchDB currently doesn't offer "automated" replication without writing some code. Ok, fair enough. To get things started, I modified thecouch.ini file by adding a line for DbUpdateNotificationProcess. More information can be found on the CouchDB wiki page for updating document views.
DbUpdateNotificationProcess=/usr/local/var/lib/couchdb/potatoe.rb
Why is the information on a page about updating document views? Well, the notification hook can be used to update views before a user actually queries those views. The page contains an example Ruby script that does just that. In addition, the same notification hook can be used for kicking off full-text indexing. But I had simpler (useless) needs.
CouchDB will take care of starting the process and will output (stdout) a short JSON message to notify the process each time a local database is updated. Make sure that the couchdb user has the proper privileges to run the script/executable.
[sourcecode="js"]
# example database update notification JSON message
{"type":"updated","db":"mytestdb"}
[/sourcecode]Anyway, here's the (simple || stupid) script that got one-way replication going between two databases containing 3 documents hosted on 2 different machines sitting 3 feet apart. It uses the RestClient gem.
[sourcecode="ruby"]
#!/usr/bin/ruby
require 'rubygems'
require 'logger'
require 'json' # sudo gem install json
#require 'json/pure' # sudo gem install json_pure
require 'rest_client' # sudo gem install rest-client
logger = Logger.new('/usr/local/var/log/couchdb/potatoe.log', 3, 1024000)
logger.level = Logger::INFO
REPLICATE = "http://192.168.0.4:5984/_replicate"
SOURCE = "http://192.168.0.4:5984/mytestdb"
TARGET = "http://192.168.0.2:5984/mytestdb"
begin
logger.info "Ready for CouchDB..."
replicate = RestClient::Resource.new REPLICATE
replicationMsg = {:source => SOURCE, :target => TARGET}.to_json
loop do
unless (jsonOut = gets).nil?
logger.debug jsonOut
message = JSON.parse jsonOut
if message["type"] == "updated" and message["db"] == "mytestdb"
logger.info "'#{message['db']}' database updated."
logger.info "Replicating..."
response = replicate.post replicationMsg,
:content_type => 'application/json'
logger.debug response
results = JSON.parse response
if results["ok"]
logger.info "Replication succeeded. " +
"session_id: #{results['session_id']} " +
"source_last_seq: #{results['source_last_seq']}"
else
# Currently, CouchDB 0.8.1 doesn't work this way.
# It returns an HTTP 500 error instead of false.
logger.info "Replication error: #{results}"
end
end
else
logger.info "CouchDB has gone away..."
break
end
end
rescue Exception => e
logger.error "Error message: #{e.message}"
logger.error "Stack trace: #{e.backtrace.inspect}"
ensure
logger.close
end
[/sourcecode]Basically, the script makes sure that it cares about the database that was updated and POSTs a JSON string to the CouchDB server's _replicate "resource" (I know...calling it a resource doesn't make sense). This initiates the replication process between the SOURCE and TARGET databases. The script doesn't stagger the updates -- it starts the replication process after every database update. This means that replication will occur each time a document is updated in the source database. And it doesn't do anything if the target database is unavailable...fault-tolerance-shmolerance!
With the script in place, permissions all set, and the CouchDB server restarted, all I had to do was update a document in the source database to initiate the replication process.
That's it.
Yes, that's it. Three documents were replicated on the other machine. Seriously, it was everything I had hoped for, and more. :-) I'm sure other people are going to replicate millions of documents across thousands of CouchDB servers and databases on all types of hardware and networks. With documents containing a ridiculous number of fields and content. And maybe using another CouchDB database to store a list of databases to replicate, and replicating that database as well. And obviously all in Erlang too...but not me, at least not today. Notes:- The _replicate response message contains the "ok" result as well as a history of replication events. I found out through the CouchDB mailing that the response will contain information about the last 50 replication events. However, I noticed that the returned events were unique to the client that initiated the replication. I didn't see the same history when I kicked off replication through a Ruby irb session, the DbUpdateNotificationProcess script, or through the Futon client.
- You can also do a hot-backup copy of the database files if you don't need "replication" like this and if your database files are small.