Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up display of parent processes #6185

Merged
merged 4 commits into from
Oct 9, 2024

Conversation

BartChris
Copy link
Collaborator

@BartChris BartChris commented Aug 19, 2024

Fixes #5934

As outlined in issue #5934, we currently face performance problems when working with parent processes in the editor, which contain a lot of children elements. In the issue discussion i thought that the problem lies in opening a lot of XML files for detection of the base type, but we are already using the index. I discovered that the problem seems to stem from accessing the index hundreds of times.
I tried to correct that and was able to bring the time to show a parent process with 600 files from more than 30 seconds to around 3 seconds.
My implementation does four things:

  • Instead of retrieving the base types for the linked processes when the structure tree is constructed recursively, i fetch all the base types before the structure tree is constructed
  • Instead of retrieving the documents from the Elasticsearch index one by one i fetch them in one go using the Mget operation. (https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docs-multi-get.html)
  • We do not have to convert the document to a DTO since we only need the basetype, which we can fetch directly from the Elasticsearch response, which is reduced in size by specifiying the only source field we need: "baseType"
  • We can use a view cache for already derived views

@BartChris BartChris force-pushed the speed_up_parent_editor_view branch 3 times, most recently from 956bf22 to 662e730 Compare August 29, 2024 16:39
@BartChris BartChris marked this pull request as ready for review August 29, 2024 17:12
Copy link
Collaborator

@henning-gerhardt henning-gerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good and there is a big speed improvement.

Two things:

  1. I'm only not sure how this change will survive the hibernate-search integration as there is a total other access to this needed data.
  2. This change should be back ported to the 3.7.x release branch as it is a big improvement for the 3.7.x releases which may be used a long time and it is currently not possible to change the metadata of hierarchy processes with a lot of child processes.

@solth
Copy link
Member

solth commented Sep 4, 2024

  • I'm only not sure how this change will survive the hibernate-search integration as there is a total other access to this needed data.

  • This change should be back ported to the 3.7.x release branch as it is a big improvement for the 3.7.x releases which may be used a long time and it is currently not possible to change the metadata of hierarchy processes with a lot of child processes.

I agree with both points. It would be great to get this improvement for Kitodo 3.7 as well. Concerning the integration of hibernate search, perhaps @matthias-ronge could weigh in on estimating wether he thinks the changes of this pull request get in the way of his ongoing works or whether they can be incorporated into hibernate search as well.

@BartChris
Copy link
Collaborator Author

BartChris commented Sep 16, 2024

I backported the changes to the 3.7 branch.

I'm only not sure how this change will survive the hibernate-search integration as there is a total other access to this needed data.

This was something i thought about as well. My impression from the Hibernate Branch and the discussion around it is, that we will strive to still enable index-based functionality including custom searches, when this is important for the functionality or it speeds up things significantly.
We will have to use the Hibernate Search abstraction, which unfortunately does not allow for MGET-Operations i suppose. But it should be possible to instead construct a normal search for multiple IDs. And we can also make the search more efficient by only retrieve what we need using projections. (https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#projections)
While I haven't tested it myself, I hope - and expect - that we can still take full advantage of the underlying index, even with the added abstraction layer introduced by Hibernate Search.

@solth solth merged commit 29421ca into kitodo:master Oct 9, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce the time to open superordinate processes with many subordinate processes
3 participants