Skip to content

Conversation

@igoristic
Copy link
Contributor

@igoristic igoristic commented Nov 5, 2025

Basic testing:

  1. Make sure both search-gov and search-services (with Elasticsearch and OpenSearch) are running together
  2. Delete any old data in both indices
  3. Create new fake data using the following script: https://gist.github.com/igoristic/91b859e40b971b71ea400eb52d9d0b24
  4. Run the following jobs:
# For ElasticSearch:
bin/docker bundle exec rails runner "Rails.logger = Logger.new(STDOUT); ElasticsearchDeleteByQueryJob.perform_now"

# For OpenSearch:
bin/docker bundle exec rails runner "Rails.logger = Logger.new(STDOUT); OpenSearchDeleteByQueryJob.perform_now"
  1. Check to confirm any documents older than SEARCHELASTIC_RETENTION_DAYS are deleted

Checklist

Please ensure you have addressed all concerns below before marking a PR "ready for review" or before requesting a re-review. If you cannot complete an item below, replace the checkbox with the ⚠️ :warning: emoji and explain why the step was not completed.

Functionality Checks

  • You have merged the latest changes from the target branch (usually main) into your branch.

  • Your primary commit message is of the format SRCH-#### <description> matching the associated Jira ticket.

  • PR title is either of the format SRCH-#### <description> matching the associated Jira ticket (i.e. "SRCH-123 implement feature X"), or Release - SRCH-####, SRCH-####, SRCH-#### matching the Jira ticket numbers in the release.

  • Automated checks pass. If Code Climate checks do not pass, explain reason for failures:

Process Checks

  • You have specified at least one "Reviewer".

@igoristic igoristic self-assigned this Nov 5, 2025
@igoristic igoristic marked this pull request as ready for review November 14, 2025 21:15
@selfdanielj selfdanielj self-requested a review November 17, 2025 14:50
Copy link
Contributor

@selfdanielj selfdanielj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality works great! I was able to run the test script and confirm that data older than the number of days is deleted while data newer than that was not. Left a couple of questions.

Copy link
Contributor

@stevenbarragan stevenbarragan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the loops waiting functionality this seems over-engineered. The job handle failures and retries just fine automatically.

@igoristic
Copy link
Contributor Author

@stevenbarragan Thanks for the review!

Can you re-review the app/jobs/elasticsearch_delete_by_query_job.rb file. I only made changes in Elasticsearch logic for now. If they look good, I will apply the same changes to Opensearch and fix all unit tests

Copy link
Contributor

@stevenbarragan stevenbarragan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test. I have requested some changes that I think are extra.

]

clusters.each do |cluster|
puts "Waiting for #{cluster[:name]} to be ready..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a wait for opensearch operation on circleci

wait_for_opensearch:
, perhaps we need a similar one for elasticsearch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you added a wait_for_elasticsearch step to circleci, so we don't need to do it in here anymore

Copy link
Contributor

@selfdanielj selfdanielj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to verify functionality in both opensearch and elasticsearch!

I know i missed some converstations but was there still a plan to support a -1 retention value that would prevent the scheduler from running?

Copy link
Contributor

@stevenbarragan stevenbarragan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to re-initiate OPENSEARCH_CLIENT.

# omniauth_login_dot_gov gem.
#
# The elasticsearch-ruby gem is compatible with OpenSearch.
OPENSEARCH_CLIENT = Elasticsearch::Client.new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OpenSearch client for both regular searches and analytics are defined here. Please remove this file.

]

clusters.each do |cluster|
puts "Waiting for #{cluster[:name]} to be ready..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you added a wait_for_elasticsearch step to circleci, so we don't need to do it in here anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants