snax

scaling rails

ree

We recently migrated Twitter from a custom Ruby 1.8.6 build to a Ruby Enterprise Edition release candidate, courtesy of Phusion. Our primary motivation was the integration of Brent's MBARI patches, which increase memory stability.

Some features of REE have no effect on our codebase, but we definitely benefit from the MBARI patchset, the Railsbench tunable GC, and the various leak fixes in 1.8.7p174. These are difficult to integrate and Phusion has done a fine job.

testing notes

I ran into an interesting issue. Ruby is faster if compiled with -Os (optimize for size) than with -O2 or -O3 (optimize for speed). Hongli pointed out that Ruby has poor instruction locality and benefits most from squeezing tightly into the instruction cache. This is an unusual phenomenon, although probably more common in interpreters and virtual machines than in "standard" C programs.

I also tested a build that included Joe Damato's heaped thread frames, but it would hang Mongrel in rb_thread_schedule() after the first GC run, which is not exactly what we want. Hopefully this can be integrated later.

benchmarks

I ran a suite of benchmarks via Autobench/httperf and plotted them with Plot. The hardware was a 4-core Xeon machine with RHEL5, running 8 Mongrels balanced behind Apache 2.2. I made a typical API request that is answered primarily from composed caches.

As usual, we see that tuning the GC parameters has the greatest impact on throughput, but there is a definite gain from switching to the REE bundle. It's also interesting how much the standard deviation is improved by the GC settings. (Some data points are skipped due to errors at high concurrency.)

upgrading

Moving from 1.8.6 to REE 1.8.7 was trivial, but moving to 1.9 will be more of an ordeal. It will be interesting to see what patches are still necessary on 1.9. Many of them are getting upstreamed, but some things (such as tcmalloc) will probably remain only available from 3rd parties.

All in all, good times in MRI land.

September 24, 2009

12 comments

Luke Melia says (September 24, 2009):

Thanks for the great writeup, Evan, and for sharing the behind the scenes data. We used your earlier writeup about GC tuning to double the performance of Weplay and it's great to see REE 1.8.7 getting a Twitter-sized workout.

Antti-Ville Tuunainen says (September 24, 2009):

This is an unusual phenomenon...

More like the standard way things work these days. Every time I have tested the various settings with real, non-trivial programs, -Os wins, often with a huge margin. The speed difference of the L1i and L2 is just massive to instructions.

Chad says (September 24, 2009):

Is REE 1.8.7 publicly available? I only see 1.8.6 on the homepage and Github.

evan says (September 24, 2009):

Antti-Ville: Yeah, a lot of people are saying this is normal. I'm going to try recompiling some gems with -Os too.

Chad: Not yet. Soon.

Ashwin Jayaprakash says (September 25, 2009):

Wasn't there an article a while ago about Twitter moving to Scala/JVM? Or is Scala not used in production?

ryan king says (September 25, 2009):

We use Scala for a few things at Twitter, but the majority of the site is Ruby.

ehsanul says (September 25, 2009):

Nice writeup. Have you guys considered migrating to JRuby?

Mark Turner says (September 25, 2009):

Was 1.8.6p287 shipped with RHEL5, or did you use the source?

Attila Szegedi says (September 25, 2009):

Back in my C++ programming days, it was common wisdom that -Os produces code that actually runs faster than -O2; since CPU frequencies are very high, machine code execution is completely dominated by cache misses for at least a decade; making sure your hot execution path fits nicely into the CPU cache with some space to spare for the data gives you the best performance.

evan says (September 25, 2009):

ehsanul: See here.

Mark: It was a patched version from source in order to include the Railsbench GC optimizations.

DAddYE says (September 26, 2009):

When you build REE, are you enabling the MBARI api, and pthread? Also, why not Thin instead of Mongrel?

Add a comment

Various HTML tags allowed. Use <pre> for code blocks and <code> for inline references. Comments may be edited for clarity.