json_pure 1.1.3 patch

4 Comments

Following up on a previous blog post about a JSON transformation issue, there is now a patched version of the json_pure 1.1.3 gem on GitHub.

The issue had to do with transforming Hashes and Arrays to JSON when they contain objects (classes) that have their own to_json methods.

A typical error message is:

lib/ruby/gems/1.8/gems/json_pure-1.1.3/lib/json/pure/generator.rb:251:in
`to_json': wrong # of arguments(2 for 0) (ArgumentError)

See the README file and the commit history for generator.rb for more info.

No warranties or guarantees that this will work for other users and uses of the json_pure gem! See the GPL license.

Feedback is welcome!

Ruby and JRuby JSON Time with json, json_pure, and Merb + DataMapper extlib

3 Comments

After trying to run a super-simple Ruby script in JRuby that ran fine through the MRI, I found myself trying to debug a JSON generator exception.

The Ruby script required a RubyGem that had its own dependencies on the json gem and Merb + DataMapper extlib gem. I thought that I could simply rebuild the gem to use json_pure and run things on JRuby. Little did I know…

First, here are the versions of everything involved:

  • Ruby 1.8.6
  • JRuby 1.1.4
  • rubygems 1.2.0
  • json 1.1.3
  • json_pure 1.1.3
  • extlib 0.9.6

Let’s take a simple example of converting Time to JSON. My environment uses UTC as the timezone. Bear with me. The code samples are repetitive, but I had to go through them to figure things out.

require 'rubygems'
require 'json'

p [ JSON.parser, JSON.generator ]
p Time.now.to_json

We’ll run it using the MRI and JRuby.

MRI

[JSON::Ext::Parser, JSON::Ext::Generator]
"\"Fri Sep 12 20:20:06 +0000 2008\""

JRuby

[JSON::Pure::Parser, JSON::Pure::Generator]
"\"Fri Sep 12 20:20:55 +0000 2008\""

Seems like a reasonable conversion and in the same format. JRuby is using json_pure as expected. Good, it’s consistent. Note that the output is not in ISO format.

Now, let’s specify the the json_pure gem.

require 'rubygems'
require 'json/pure'

p [ JSON.parser, JSON.generator ]
p Time.now.to_json

Run it through the MRI and JRuby.

MRI

[JSON::Pure::Parser, JSON::Pure::Generator]
"\"Fri Sep 12 20:29:13 +0000 2008\""

JRuby

[JSON::Pure::Parser, JSON::Pure::Generator]
"\"Fri Sep 12 20:29:45 +0000 2008\""

Again, everything is consistent. Good news again.

Since things are going well, let’s try using extlib in the script so that the Time is in ISO format.

require 'rubygems'
require 'extlib'
require 'json'

p [ JSON.parser, JSON.generator ]
p Time.now.to_json

Run it through the MRI and JRuby.

MRI

[JSON::Ext::Parser, JSON::Ext::Generator]
"\"2008-09-12T20:44:06+00:00\""

JRuby

[JSON::Pure::Parser, JSON::Pure::Generator]
"\"2008-09-12T20:44:31+00:00\""

Excellent! Everything is working wonderfully. No problems at all!

At this point, let me do what I wanted to do in the first place and modify the script to be a basic variation of the problematic Ruby script that gave me a headache.

I’m going to place the time in a Hash, but I won’t include extlib yet.

require 'rubygems'
require 'json'

p [ JSON.parser, JSON.generator ]
p Time.now.to_json

h = {"created_on" => Time.now}
p h.to_json

Run it through the MRI and JRuby.

MRI

[JSON::Ext::Parser, JSON::Ext::Generator]
"\"Fri Sep 12 20:48:50 +0000 2008\""
"{\"created_on\":\"Fri Sep 12 20:48:50 +0000 2008\"}"

JRuby

[JSON::Pure::Parser, JSON::Pure::Generator]
"\"Fri Sep 12 20:49:26 +0000 2008\""
"{\"created_on\":\"Fri Sep 12 20:49:26 +0000 2008\"}"

Fantastic! It’s a useless script, but it’s going places: The Time is in a Hash that is being converted to JSON.

Ok, let’s add extlib so that we can get the JSON Time ISO format…and because our problematic Ruby script uses a particular gem (a very useful gem for the Ruby script) that happens to depend on extlib.

require 'rubygems'
require 'extlib'
require 'json'

p [ JSON.parser, JSON.generator ]
p Time.now.to_json

h = {"created_on" => Time.now}
p h.to_json

Run it through the MRI and JRuby.

MRI

[JSON::Ext::Parser, JSON::Ext::Generator]
"\"2008-09-12T20:51:43+00:00\""
"{\"created_on\":\"2008-09-12T20:51:43+00:00\"}"

JRuby

[JSON::Pure::Parser, JSON::Pure::Generator]
"\"2008-09-12T20:52:02+00:00\""
/home/share/storage/jruby-1.1.4/lib/ruby/gems/1.8/gems/json_pure-1.1.3/lib/json/pure/generator.rb:251:in `to_json': wrong # of arguments(2 for 0) (ArgumentError)
	from /home/share/storage/jruby-1.1.4/lib/ruby/gems/1.8/gems/json_pure-1.1.3/lib/json/pure/generator.rb:251:in `json_transform'
	from /home/share/storage/jruby-1.1.4/lib/ruby/gems/1.8/gems/json_pure-1.1.3/lib/json/pure/generator.rb:245:in `each'
	from /home/share/storage/jruby-1.1.4/lib/ruby/gems/1.8/gems/json_pure-1.1.3/lib/json/pure/generator.rb:245:in `map'
	from /home/share/storage/jruby-1.1.4/lib/ruby/gems/1.8/gems/json_pure-1.1.3/lib/json/pure/generator.rb:245:in `json_transform'
	from /home/share/storage/jruby-1.1.4/lib/ruby/gems/1.8/gems/json_pure-1.1.3/lib/json/pure/generator.rb:218:in `to_json'
	from time_extlib_to_json.rb:9

EH?! Now why in the world would it give me an error? Everything was fine up until this point. Why was it giving me a “wrong # of arguments(2 for 0) (ArgumentError)” exception?

Well, I had a great time with JRuby debugging sessions looking at the json_transform method in json_pure’s generator.rb.

Just like the exception said, I came to see that the Time.to_json method no longer accepted two arguments…even though the json_transform method was trying to pass them in through the s << value.to_json(state, depth + 1) line. When the exception was thrown, the Time instance, aka value, had a to_json method that took 0 arguments.

          def json_transform(state, depth)
            delim = ','
            delim << state.object_nl if state
            result = '{'
            result << state.object_nl if state
            result << map { |key,value|
              s = json_shift(state, depth + 1)
              s << key.to_s.to_json(state, depth + 1)
              s << state.space_before if state
              s << ':'
              s << state.space if state
              s << value.to_json(state, depth + 1)
            }.join(delim)
            result << state.object_nl if state
            result << json_shift(state, depth)
            result << '}'
            result
          end

But why was this only showing up after I put a Time instance into a Hash? Well, the json_transform method call in generator.rb is in the module Hash, and it’s responsible for calling to_json with two arguments, state and depth + 1, to convert a Hash key’s value.

module JSON
  module Pure
    module Generator
      module GeneratorMethods
        module Hash
          def json_transform(state, depth)

Nice. So now what? Well, I tried requiring ‘json/add/core’ and running it under JRuby…

require 'rubygems'
require 'extlib'
require 'json'
require 'json/add/core'

p [ JSON.parser, JSON.generator ]
p Time.now.to_json

h = {"created_on" => Time.now}
p h.to_json

It works because ‘json/add/core’ gives Time.to_json a variable number of arguments again. But it gave this JSON output:

JRuby

[JSON::Pure::Parser, JSON::Pure::Generator]
"{\"json_class\":\"Time\",\"s\":1221254881,\"n\":82427000}"
"{\"created_on\":{\"json_class\":\"Time\",\"s\":1221254881,\"n\":83467000}}"

I’m not into that Time format at all!

If I require ‘extlib’ after ‘json’ and ‘json/add/core’, I get the same error of course.

Maybe I’ll just drop using the gem that requires extlib and override Time.to_json myself…That doesn’t seem right since that gem has other functionality that I need.

Right now, I feel like this is both an extlib and json_pure issue. extlib’s Time.to_json doesn’t accept a variable number of arguments, and the ‘json/add/core’ time format isn’t what I was looking for.

Hmmm…

Update

Actually, I take back what I said about it being either an extlib or json_pure issue. This might just be a case of code coming together and clashing. :-(

Update 2

I guess eating helps clear the head! It looks like it’s a json_pure issue since it happens on both JRuby and the MRI when the script requires ‘json/pure’. Here are the parts of the code in the C extension where it differs from Ruby:

static VALUE mHash_to_json(int argc, VALUE *argv, VALUE self)
{
    VALUE Vstate, Vdepth, result;
    long depth;

    rb_scan_args(argc, argv, "02", &Vstate, &Vdepth);
    depth = NIL_P(Vdepth) ? 0 : FIX2LONG(Vdepth);
    if (NIL_P(Vstate)) {
        long len = RHASH(self)->tbl->num_entries;
        result = rb_str_buf_new(len);
        rb_str_buf_cat2(result, "{");
        rb_hash_foreach(self, hash_to_json_i, result);
        rb_str_buf_cat2(result, "}");
    } else {
        GET_STATE(Vstate);
        check_max_nesting(state, depth);
        if (state->check_circular) {
            VALUE self_id = rb_obj_id(self);
            if (RTEST(rb_hash_aref(state->seen, self_id))) {
                rb_raise(eCircularDatastructure,
                        "circular data structures not supported!");
            }
            rb_hash_aset(state->seen, self_id, Qtrue);
            result = mHash_json_transfrom(self, Vstate, LONG2FIX(depth));
            rb_hash_delete(state->seen, self_id);
        } else {
            result = mHash_json_transfrom(self, Vstate, LONG2FIX(depth));
        }
    }
    OBJ_INFECT(result, self);
    return result;
}

static int hash_to_json_i(VALUE key, VALUE value, VALUE buf)
{
    VALUE tmp;

    if (key == Qundef) return ST_CONTINUE;
    if (RSTRING_LEN(buf) > 1) rb_str_buf_cat2(buf, ",");
    tmp = rb_funcall(rb_funcall(key, i_to_s, 0), i_to_json, 0);
    Check_Type(tmp, T_STRING);
    rb_str_buf_append(buf, tmp);
    OBJ_INFECT(buf, tmp);
    rb_str_buf_cat2(buf, ":");
    tmp = rb_funcall(value, i_to_json, 0);
    Check_Type(tmp, T_STRING);
    rb_str_buf_append(buf, tmp);
    OBJ_INFECT(buf, tmp);

    return ST_CONTINUE;
}

Update 3

Follow-up post:
json_pure 1.1.3 patch

Feedback is welcome!

CouchDB 0.8.1 Simple One-Way Replication

11 Comments

CouchDB 0.8.1 has been available for a few weeks, so I decided to see how it was coming along. For no real reason at all, I wanted to get a very simple replication process going between two CouchDB databases hosted on different machines. When I say “simple”, I mean “don’t get all excited because this is not a post about replicating all the documents reviewed in a Big Tobacco lawsuit!” Not that you would get excited about that either. I just wanted to see a few documents show up in another database on another machine. Simple enough.

The CouchDB website, wiki, and mailing list archives were actually pretty good for getting started.

Easy Install

Thankfully, CouchDB has some pretty good installation information on its wiki and in its source distribution README file. In a short amount of time, I had a working CouchDB server on Ubuntu and OS X.

Ubuntu Hardy Heron 8.04 build and install

The CouchDB wiki page for Ubuntu contains links to the two following blog posts. I followed the Barking Iguana instructions.

http://barkingiguana.com/2008/06/28/installing-couchdb-080-on-ubuntu-804
http://www.chetanmittal.com/2008/6/15/install-couchdb-on-ubuntu-hardy-heron-8-04

Notes:
The Barking Iguana update-rc.d step and the init.d copy step should be switched. Also, I would add the couchdb user as a system user.  The CouchDB source distribution README file has this step in its instructions, but the home directory should be /usr/local/var/lib/couchdb.

sudo adduser --system --home /usr/local/var/lib/couchdb --no-create-home \
--shell /bin/bash --group --gecos "CouchDB Administrator" couchdb

To allow CouchDB to listen for external requests, modify the BindAddress in the /usr/local/etc/couchdb/couch.ini file, and update any firewall settings as needed:

;original
;BindAddress=127.0.0.1
;modified
BindAddress=0.0.0.0

Bind the CouchDB server to an externally available IP address or to 0.0.0.0.

OS X Leopard binary

http://jan.prima.de/~jan/plok/archives/142-CouchDBX-Revival.html

Thanks to Jan L, one of the Apache CouchDB committers, CouchDBX is the easiest way to get a CouchDB server running on OS X. To allow CouchDB to listen for external requests, view the CouchDBX.app package contents, edit the /Contents/Resources/CouchDb/etc/couchdb/couch.ini file, and update the OS X firewall as needed.

Replicating 3 Documents

CouchDB comes with the Futon Utility Client, a browser-based user interface to setup databases, add/update/delete documents, run a CouchDB test suite, and manually initiate replication through its “Replicator” page. The Replicator tool offers replication from a source to target database, hosted locally or remotely available through HTTP. Pretty straightforward to use. Strangely, error messages are displayed in JSON format through a JavaScript alert, but it’s not a big deal.

So, at this point, I had two CouchDB servers on two different machines. I created a database on each server by using the Futon client and used the same name for both databases. I then proceeded to add 3 documents with randomly added fields and values into my designated “source” database. Now that I had servers, databases, and documents, it was time to figure out how to replicate the database automatically…

Lo-and-behold, I found out that CouchDB currently doesn’t offer “automated” replication without writing some code. Ok, fair enough. To get things started, I modified the couch.ini file by adding a line for DbUpdateNotificationProcess. More information can be found on the CouchDB wiki page for updating document views.

DbUpdateNotificationProcess=/usr/local/var/lib/couchdb/potatoe.rb

Why is the information on a page about updating document views? Well, the notification hook can be used to update views before a user actually queries those views. The page contains an example Ruby script that does just that. In addition, the same notification hook can be used for kicking off full-text indexing. But I had simpler (useless) needs.

CouchDB will take care of starting the process and will output (stdout) a short JSON message to notify the process each time a local database is updated. Make sure that the couchdb user has the proper privileges to run the script/executable.

# example database update notification JSON message
{"type":"updated","db":"mytestdb"}

Anyway, here’s the (simple || stupid) script that got one-way replication going between two databases containing 3 documents hosted on 2 different machines sitting 3 feet apart. It uses the RestClient gem.

#!/usr/bin/ruby

require 'rubygems'
require 'logger'
require 'json' # sudo gem install json
#require 'json/pure' # sudo gem install json_pure
require 'rest_client' # sudo gem install rest-client

logger = Logger.new('/usr/local/var/log/couchdb/potatoe.log', 3, 1024000)
logger.level = Logger::INFO

REPLICATE = "http://192.168.0.4:5984/_replicate"
SOURCE = "http://192.168.0.4:5984/mytestdb"
TARGET = "http://192.168.0.2:5984/mytestdb"

begin

  logger.info "Ready for CouchDB..."
  replicate = RestClient::Resource.new REPLICATE
  replicationMsg = {:source => SOURCE, :target => TARGET}.to_json

  loop do
    unless (jsonOut = gets).nil?
      logger.debug jsonOut
      message = JSON.parse jsonOut

      if message["type"] == "updated" and message["db"] == "mytestdb"
        logger.info "'#{message['db']}' database updated."
        logger.info "Replicating..."
        response = replicate.post replicationMsg,
                                  :content_type => 'application/json'
        logger.debug response
        results = JSON.parse response
        if results["ok"]
          logger.info "Replication succeeded. " +
                      "session_id: #{results['session_id']} " +
                      "source_last_seq: #{results['source_last_seq']}"
        else
          # Currently, CouchDB 0.8.1 doesn't work this way.
          # It returns an HTTP 500 error instead of false.
          logger.info "Replication error: #{results}"
        end

      end
    else
      logger.info "CouchDB has gone away..."
      break
    end
  end
rescue Exception => e
  logger.error "Error message: #{e.message}"
  logger.error "Stack trace: #{e.backtrace.inspect}"
ensure
  logger.close
end

Basically, the script makes sure that it cares about the database that was updated and POSTs a JSON string to the CouchDB server’s _replicate “resource” (I know…calling it a resource doesn’t make sense). This initiates the replication process between the SOURCE and TARGET databases. The script doesn’t stagger the updates — it starts the replication process after every database update. This means that replication will occur each time a document is updated in the source database. And it doesn’t do anything if the target database is unavailable…fault-tolerance-shmolerance!

With the script in place, permissions all set, and the CouchDB server restarted, all I had to do was update a document in the source database to initiate the replication process.

That’s it.

Yes, that’s it. Three documents were replicated on the other machine. Seriously, it was everything I had hoped for, and more. :-)

I’m sure other people are going to replicate millions of documents across thousands of CouchDB servers and databases on all types of hardware and networks. With documents containing a ridiculous number of fields and content. And maybe using another CouchDB database to store a list of databases to replicate, and replicating that database as well. And obviously all in Erlang too…but not me, at least not today.

Notes:

  • The _replicate response message contains the “ok” result as well as a history of replication events. I found out through the CouchDB mailing that the response will contain information about the last 50 replication events. However, I noticed that the returned events were unique to the client that initiated the replication. I didn’t see the same history when I kicked off replication through a Ruby irb session, the DbUpdateNotificationProcess script, or through the Futon client.
  • You can also do a hot-backup copy of the database files if you don’t need “replication” like this and if your database files are small.

Update

I placed a JRuby script on GitHub that does batch and timed replications:
Jpotatoe: A simple JRuby script for one-way CouchDB 0.8.1 replication

Feedback is welcome.

Older Entries