From 20 Seconds to 6: A Guide to Ruby Concurrency

Introduction: Beyond the Quick Fix

In engineering, it’s easy to focus on a tactical solution that meets immediate requirements. A script is slow; we make it faster. But a principal engineer’s role is to look beyond the tactical and evaluate the strategic, long-term implications of a design choice. This post analyzes a real-world performance bottleneck, not just to solve it, but to illustrate a durable architectural pattern for handling concurrency in I/O-bound systems.

The challenge: a workflow of dependent API calls that took 19.5 seconds. The goal: under 7 seconds. The result: a robust solution that finishes in 6 seconds and, more importantly, provides a scalable and maintainable concurrency model.

Identifying the Bottleneck: A Familiar Story

The problem originates with a series of web requests that are executed sequentially. This is a classic anti-pattern in any system dealing with network I/O. The total execution time becomes the sum of all latencies.

The System Constraints:

Service A: 1s latency, 3 concurrent requests max.
Service B: 2s latency, 2 concurrent requests max.
Service C: 1s latency, 1 concurrent request max.

The initial implementation was functionally correct but architecturally naive, ignoring any opportunity for parallel execution.

# The original, sequential implementation.
# Total time: (3 * 1s) + 2s + 1s + (3 * 1s) + 2s + 1s + (3 * 1s) + 2s + 1s + 1s = ~19 seconds
def original_implementation
  # ... full implementation from previous example ...
end

This sequential approach represents a critical failure in system design, creating an unnecessary and significant performance bottleneck.

Evaluating Concurrency Models: A Tale of Two Approaches

To address this, we must introduce concurrency. In Ruby, for I/O-bound tasks like these, several options exist. Let’s analyze two: raw Threads and the async library.

Approach 1: The Tactical Solution with `Thread`

The most direct way to introduce concurrency is using Ruby’s Thread class. By wrapping requests in Thread.new, we can execute them in parallel.

# A solution using manually orchestrated threads.
def threads
  # ... full implementation from previous example ...
end

Architectural Critique:

While this solution works and meets the performance target, it’s a brittle, tactical fix. From a principal engineer’s standpoint, it has several significant drawbacks:

Cognitive Overhead: The developer is now a manual scheduler, responsible for orchestrating a complex dependency graph with join calls. This logic is imperative, hard to reason about, and prone to deadlocks or race conditions as complexity grows.
Poor Error Handling: What happens if a single thread fails? Without a supervisory structure, error propagation is inconsistent and can leave the system in an indeterminate state.
Resource Management: This implementation doesn’t explicitly manage the concurrency limits. It relies on a carefully timed sequence of joins to implicitly stay within the rules. A small logic change could easily violate the server’s rate limits. It’s an accident waiting to happen.

This approach is a code smell. It solves the immediate problem but creates a future maintenance burden.

Approach 2: The Strategic Solution with `async` and Structured Concurrency

A more robust architectural choice is to use a library that provides structured concurrency. The socketry/async gem is an excellent example. It allows us to define what we want to happen, not how to schedule it.

The core of this solution is the Async::Semaphore, a classic concurrency primitive for limiting access to a shared resource.

# The architecturally sound solution.
def async
  semaphore_a = Async::Semaphore.new(3) # Limit for Service A
  semaphore_b = Async::Semaphore.new(2) # Limit for Service B

  # ... tasks are launched within the semaphores ...
end

Architectural Advantages:

Declarative & Maintainable: The intent is clear. We declare our resource limits upfront. The tasks are launched, and the library’s scheduler handles the rest. Adding a new dependency or changing a limit is a trivial, localized modification.
Structured Concurrency: The Async block creates a supervised scope. If any task within it fails, the entire block can terminate cleanly, and errors propagate predictably. This is fundamental to building resilient systems.
Explicit Resource Management: The use of semaphores makes resource constraints a first-class part of the design. It’s a durable pattern that can be applied to database connection pools, third-party API rate limits, or any other constrained resource.

Broader Implications & Architectural Takeaways

The exercise of reducing this script’s runtime reveals several key principles applicable to any large-scale system:

Concurrency is an Architectural Concern: It’s not an optimization to be sprinkled on later. The choice of a concurrency model has cascading effects on error handling, resource management, and code maintainability.
Embrace Structured Concurrency: Avoid raw, unmanaged threads. Use modern libraries and patterns that provide task supervision, clean cancellation, and predictable error propagation. This is a pillar of modern, resilient system design.
Understand the Nature of Your Bottleneck: This solution is effective because the work is I/O-bound. Due to Ruby’s Global VM Lock (GVL), this same parallelization strategy would be ineffective for CPU-bound tasks. A principal engineer must understand the platform’s execution model to choose the right tool for the job.
Build for Observability: In a real system, how would we have known which service was the bottleneck? Proper instrumentation—metrics on request latency, semaphore contention, and task throughput—is essential for identifying and diagnosing these issues in production.

Conclusion

We successfully reduced the execution time from 19.5 seconds to 6 seconds. But the real victory isn’t the raw performance gain; it’s the adoption of an architectural pattern that is more resilient, scalable, and maintainable. By moving from a tactical fix to a strategic concurrency model, we’ve built a solution that is not only faster but fundamentally better. That is the essence of the principal engineer’s mindset.

Introduction: Beyond the Quick Fix#

Identifying the Bottleneck: A Familiar Story#

Evaluating Concurrency Models: A Tale of Two Approaches#

Approach 1: The Tactical Solution with Thread#

Approach 2: The Strategic Solution with async and Structured Concurrency#

Broader Implications & Architectural Takeaways#

Conclusion#