bundler/definition.rb

diagram image


Bundler#definition

diagram image

As we can see, Definition.build take a long time to process.


Definition.build

diagram image

From here we can see Dsl.evaluate takes the most time


Dsl.evaluate

diagram image

We can see that the time is split between eval_gemfile and to_definition.


builder.eval_gemfile

diagram image

We can see here that when we take the contents of the bundler file, and instance_eval it, we’ll spend about 55ms doing that.

Digging into the instance_eval a little more using TracePoint, we can see that there are hundreds of mini-methods called starting with dsl#source. We get this approximate trace:

[161, Bundler::Dsl, :source, :call]
[336, Bundler::Dsl, :normalize_hash, :call]
[435, Bundler::Dsl, :normalize_source, :call]
[449, Bundler::Dsl, :check_primary_source_safety, :call]
[90, Bundler::SourceList, :rubygems_primary_remotes, :call]
[38, Bundler::SourceList, :add_rubygems_remote, :call]
[210, Bundler::Source::Rubygems, :add_remote, :call]
...
[115, Bundler::SourceList, :warn_on_git_protocol, :call]
[245, #<Class:Bundler>, :settings, :call]
[54, Bundler::Settings, :[], :call]
[224, Bundler::Settings, :key_for, :call]
[325, Bundler::Dsl, :with_source, :call]
[79, Bundler::Dependency, :initialize, :call]
[38, Gem::Dependency, :initialize, :call]
...
[54, #<Class:Gem::Requirement>, :create, :call]
[123, Gem::Requirement, :initialize, :call]
[121, Bundler::Dsl, :gem, :call]
[347, Bundler::Dsl, :normalize_options, :call]
[336, Bundler::Dsl, :normalize_hash, :call]
[343, Bundler::Dsl, :valid_keys, :call]
[418, Bundler::Dsl, :validate_keys, :call]
[209, Bundler::Dsl, :git, :call]
[336, Bundler::Dsl, :normalize_hash, :call]
[24, Bundler::SourceList, :add_git_source, :call]
[13, Bundler::Source::Git, :initialize, :call]
[96, Bundler::SourceList, :add_source_to_list, :call]
[49, Bundler::Source::Git, :hash, :call]
[79, Bundler::Source::Git, :name, :call]
[49, Bundler::Source::Git, :hash, :call]
[79, Bundler::Source::Git, :name, :call]
... repeat the last block a lot, particularly Bundler::Source::Git calls ...
[115, Bundler::SourceList, :warn_on_git_protocol, :call]
[245, #<Class:Bundler>, :settings, :call]
[54, Bundler::Settings, :[], :call]
[224, Bundler::Settings, :key_for, :call]
[325, Bundler::Dsl, :with_source, :call]
[79, Bundler::Dependency, :initialize, :call]
[38, Gem::Dependency, :initialize, :call]

Without optimizing dozens of places, this is likely a dead end. We can look at caching, but it is uncacheable. Due to extensive use of procs and default values in hashes, we cannot cache the class object.

This is a dead end.


builder.to_definition

This method simply calls Definition.new, so we’ll move to that instead.


Definition.new

diagram image

Taking some of the more “expensive” lines, we can dig a bit deeper to get more accurate numbers.

line num_calls time (s)
@locked_gems = LockfileParser.new(@lockfile_contents), :a1, , 1 0.03465300000971183
@locked_specs = SpecSet.new(@locked_gems.specs), :a1, , 1 0.002618999977130443
converge_path_sources_to_gemspec_sources, :a1, , 1 0.006308999989414588
@source_changes = converge_sources, :a1, , 1 0.010037000000011176
@dependency_changes = converge_dependencies, :a1, , 1 0.022082999988924712
fixup_dependency_types!, :a1, , 1 0.0025529999984428287

LockfileParser.new

See lockfile_parser


definition#coverge_dependencies

diagram image

It is very obvious to see that this particular line locked_source = @locked_deps.select {|d| d.name == dep.name }.last (run 112812 times) :a1, 0.001, 0.182 is the root cause of the slowness. Run 112-113K times for the Shopify application, it is slow and could likely benefit from some up front hashing.

This particular line was fixed by this pull request.

After fixing the issue surrounding select, my attention turned to dependency_without_type = proc {|d| Gem::Dependency.new(d.name *d.requirement.as_list) }, which is run 475 times and takes 16ms. This pull request provides me with the context to know that we want to compare name and requirement, but not necessarily anything else.

Let’s look at the documentation for Gem::Dependency to understand how equality works so we don’t regress. The entry for comparison shows the following:

Uses this dependency as a pattern to compare to other. This dependency will match if the name matches the other’s name, and other has only an equal version requirement that satisfies this dependency.

As we can see, we simply need to match the name and version requirement to match. This means we don’t necessarily need the Gem::Dependency as we simply use it for equality. That said equal version requirement isn’t a particularly easy thing to do. Requirements such as 1.0.1 and > 1.0.0 are ok, but are not easily compared. This means we can’t do something more naive like compare 2 arrays. Let’s look at what the comparison is actually doing.

The comparison is making sure all dependencies match. We could likely do that with individual comparisons, but we’d want to avoid comparing everything if needed (aka bail with false on the first mis-match). The following block will make sure we have a corresponding entry in @locked_deps for all dependencies and that they match.

@dependencies.any? do |dependency|
 locked_dep = @locked_deps[dependency.name]
 next true if locked_dep.nil?
 dependency === locked_dep
end

This results in the following timings:

diagram image

As you can see, we’ve saved about half of the method time.

Running the test added to the pull request used for context results in a success!


Actions

  • Convert @locked_deps to hash, see if that improves things with O(1) access instead. Fixed in this pull request
  • Avoid using Gem::Dependency just for comparison in converge_dependencies. Fixed in this pull request
  • Can parse_source in the lockfile parse be faster? Not really, this was a dead end
  • Look at caching the evaled gemfile. Not easily possible. There are tons of side effects of the eval which change class level variables. It would require a large refactor for minimal benefit.
  • Cache the class instance instead? Uncacheable. Due to extensive use of procs and default values in hashes, we cannot cache the class object.