Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Elasticsearch client auto instrumentation #357

Closed
wants to merge 19 commits into from

Conversation

estolfo
Copy link

@estolfo estolfo commented Mar 1, 2023

These changes add instrumentation of the Elasticsearch client. More technically, it instruments the elastic-transport gem, which is a component/dependency of the elasticsearch client.

The instrumentation is inspired by Elastic APM agent's instrumentation, found here.

There are some open questions with this instrumentation. Namely, there are Elasticsearch client auto instrumentations for the Java, Python, .NET OpenTelemetry projects but they are inconsistent with each other. So I have an open issue with the OTel specifications group to add a spec for Elasticsearch client instrumentation. Until that spec is written and approved, this PR will be in flux.

At this point, I think the code can be reviewed for structure and functionality. I will move it from draft status into "ready for review" when the above-mentioned spec is approved.

Note that this instrumentation is only for Elasticsearch client >= v8.0

TODO:

  • Verify the versioning scheme in the auto instrumentation gems. Have I correctly set the initial version to 0.0.1?
  • Spec written and approved by the OTel specs group
  • Complete TODOs in client.rb
  • Find out where documentation of sanitize_field_names should go

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 1, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@estolfo estolfo marked this pull request as draft March 1, 2023 12:23
# 'db.name' => database_name,
'db.operation' => method,
# TODO: is this true for all elasticsearch client requests?
'net.transport' => IP_TCP,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that field isn't that interesting here given the http client itself might be instrumented?
and if that isn't the case than shouldn't the http.* attributes be of more interest here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the common case (something like Faraday) it could possibly be instrumented already. We have had attempts at Elasticsearch instrumentation before, and the approach taken by @plantfansam before is the one I would recommend: last PR

Specifically, OpenTelemetry::Common::HTTP::ClientContext.with_attributes is designed to augment a lower-level HTTP span with attributes we care about from a calling context (like an Elasticsearch client). It's a way to reduce the number of basically-identical spans nesting on top of each other, in the common case of "this client gem is really a thin wrapper around an HTTP client like Faraday".

The previous PR gives the user a choice in the matter, and I think that might still be desirable here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see that there was an PR already for this instrumentation, thanks for pointing that out. I'll take a look at it and see how these can be merged.
When you say that the approach is one you would recommend, you mean that there's something about @plantfansam's approach that is preferable to the one in this PR? If so, I might have some follow-up questions to clarify what exactly you'd prefer to have in the instrumentation as I take a look.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've taken a look at the other PR and I think it makes a lot of sense to use OpenTelemetry::Common::HTTP::ClientContext.with_attributes. We also have a way of avoiding redundant spans in the Elastic APM agent's Elasticsearch client instrumentation. So I'll add that in, including the config option, so the user has a choice.

Is there anything else about the approach in the other PR that you'd like to include in this one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I've added a config option capture_es_spans to the instrumentation so that the user can decide whether they want the http span to be enriched with elasticsearch instrumentation attributes or if they want separate spans for es and http.

Copy link
Contributor

@ahayworth ahayworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! I left a few initial thoughts, but I'll wait until you mark it ready for review before doing much deeper.

The most recent attempt at this was by @plantfansam in open-telemetry/opentelemetry-ruby#1189 - that version is slightly different; perhaps we could harmonize the approaches?


group :test do
gem 'opentelemetry-instrumentation-base', path: '../base'
gem 'pry-nav'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should pry-nav be a dev dependency instead of a test dependency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll update that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've kept pry-nav in the test group but added a development group, as I think running tests and debugging with pry is common practice. The rails instrumentation also has pry in test and development, as do some other instrumentations.


option :peer_service, default: nil, validate: :string
option :db_statement, default: :obfuscate, validate: %I[omit obfuscate include]
option :sanitize_field_names, default: nil, validate: ->(v) { v.is_a?(String) || v.is_a?(Array) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since :sanitize_field_names is a plural, I think that implies it should always be an array. So I would change this to:

Suggested change
option :sanitize_field_names, default: nil, validate: ->(v) { v.is_a?(String) || v.is_a?(Array) }
option :sanitize_field_names, default: [], validate: :array

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here, I'm accepting a string representing a list as well but happy to change this if it's overkill.

Comment on lines 41 to 46
def convert_config(config)
return unless (field_names = config[:sanitize_field_names])

field_names = field_names.is_a?(String) ? field_names.split(',') : Array(field_names)
config[:sanitize_field_names] = field_names.map { |p| WildcardPattern.new(p) }
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take the suggestion for L37, then this whole method can be simplified to:

Suggested change
def convert_config(config)
return unless (field_names = config[:sanitize_field_names])
field_names = field_names.is_a?(String) ? field_names.split(',') : Array(field_names)
config[:sanitize_field_names] = field_names.map { |p| WildcardPattern.new(p) }
end
def convert_config(config)
config[:sanitize_field_names] = field_names.map { |p| WildcardPattern.new(p) }
end

# 'db.name' => database_name,
'db.operation' => method,
# TODO: is this true for all elasticsearch client requests?
'net.transport' => IP_TCP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the common case (something like Faraday) it could possibly be instrumented already. We have had attempts at Elasticsearch instrumentation before, and the approach taken by @plantfansam before is the one I would recommend: last PR

Specifically, OpenTelemetry::Common::HTTP::ClientContext.with_attributes is designed to augment a lower-level HTTP span with attributes we care about from a calling context (like an Elasticsearch client). It's a way to reduce the number of basically-identical spans nesting on top of each other, in the common case of "this client gem is really a thin wrapper around an HTTP client like Faraday".

The previous PR gives the user a choice in the matter, and I think that might still be desirable here.

module Instrumentation
module Elasticsearch
# Object representing a user-supplied key pattern that behaves as a regex
class WildcardPattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just allow a regex in this case? Is this mimicking an upstream Elasticsearch wildcard pattern? (And, if so, could we just borrow their implementation?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I used a special object for this and didn't require a regex was to allow the config option sanitize_field_names to be set as a list of strings or a string representing a list. So the config option could be set as:
{ sanitize_field_names: 'foo.*,*.bar' } }
or
{ sanitize_field_names: ['Auth*tion', 'abc*', '*xyz'] }

I decided to do that instead of requiring a regex or list of regexes so that the config could be set via an environment variable, if needed. Maybe that's not a valid use case in OTel instrumentations? I'm happy to change this, just let me know what makes more sense for your users.

@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2023

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label May 4, 2023
@estolfo
Copy link
Author

estolfo commented May 4, 2023

Just to recap where we are at this point: I have an open PR with the semantic conventions group with a proposal for Elasticsearch client instrumentation.
There are a few instrumentations that already exist (.NET, Python, Java) and we are checking with those groups to see if they have any concerns.
I should have more info next week, as we are bringing it up in the language SIGs this week. Once that above PR is merged, I can return to this PR and update it and mark is as ready for review.
I believe I've addressed all the comments and questions specific to this PR at this point.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2023

👋 This pull request has been marked as stale because it has been open with no activity. You can: comment on the issue or remove the stale label to hold stale off for a while, add the keep label to hold stale off permanently, or do nothing. If you do nothing this pull request will be closed eventually by the stale bot

@github-actions github-actions bot added the stale Marks an issue/PR stale label Jun 4, 2023
@ericmustin ericmustin added the keep Ensures stale-bot keeps this issue/PR open label Jun 4, 2023
@ericmustin
Copy link
Contributor

added the keep label here, as far as i can tell this is mostly blocked by semconv consensus, which (reviewing the associated semconv pr) seems to be progressing/has some approvals/is close to merging. that being said, if this stays stuck in semconv purgatory for too long, i'd err toward merging this pr with the intention to update when semconv gets finalized, don't want to let perfect be the enemy of good.

@estolfo
Copy link
Author

estolfo commented Jun 5, 2023

Hey @ericmustin thanks for adding the keep label.
Based on discussions with the semantic conventions group, it seems like native otel instrumentation is a better path forward. So I've been working with the Elasticsearch clients team to write native instrumentation into the client libraries so that we don't have separate instrumentations that have to be maintained.
With that said, the proposed semantic conventions haven't been merged yet and the native instrumentation work is still in progress. Concerning this PR, we can close it if you'd like as it's unlikely this will be the path forward for OTel ES client instrumentation. Maybe I could have the Ruby group review the native instrumentation in the ES client when it's complete though?

@ericmustin
Copy link
Contributor

@estolfo ah, no worries! yea, native is better (and a shiny new thing so, cool to see y'all spearheading that approach!). definitely feel free to tag any of the folks from here for a second set of eyes when it's ready for review! let's go ahead and close this, we can always resurrect if we feel like its a better path in the future.

@ericmustin ericmustin closed this Jun 5, 2023
@cheempz
Copy link
Contributor

cheempz commented Jun 5, 2023

Late to the party but I have been watching this one with interest because we would like to provide elasticsearch instrumentation in our OTel custom distro for Ruby. I agree that client native instrumentation is the future, but wondering if/how native client instrumentation would support those using older versions of elastic clients?

@arielvalentin
Copy link
Collaborator

Also wondering what client native instrumentation means.

Is this the same as having 1st Party OTel API support in the gems themselves?

@estolfo
Copy link
Author

estolfo commented Jun 6, 2023

@cheempz Thanks for your interest in this PR and for the question. We aren't planning on backporting the native instrumentation feature to older versions of the ES client, so it'll only be available in whatever version of the client it's released in and upwards. There is no current Elasticsearch OTel client instrumentation in Ruby so I think this is a reasonable. Do you have any major concerns about this?

@estolfo
Copy link
Author

estolfo commented Jun 6, 2023

@arielvalentin Yes, I mean instrumentation (use of the OTel API, creation of spans, etc) in the Elasticsearch client code itself. We refer to it as "native instrumentation" but maybe this is confusing in the context of Ruby because "native" usually means a c extension? Is there another term you can recommend using in the context of Ruby?
Here is a rough outline of what it would look like in Ruby. There are also changes in the elasticsearch-api gem. I haven't completed this work or written tests yet but I will open a PR when it's more complete.
Here is the native instrumentation in the Elasticsearch Java client my colleague as been working on.
I would gladly come to the Ruby SIG today to chat more about it but I not able to make the time. I can come next week though :)

@cheempz
Copy link
Contributor

cheempz commented Jun 6, 2023

We aren't planning on backporting the native instrumentation feature to older versions of the ES client, so it'll only be available in whatever version of the client it's released in and upwards.

Thanks @estolfo, that helps set the expectation. Do you mind pinging back on this thread once a draft PR is created for the Ruby ES client?

@estolfo
Copy link
Author

estolfo commented Jun 12, 2023

@cheempz yes, sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep Ensures stale-bot keeps this issue/PR open stale Marks an issue/PR stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants