Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Sphinx' returning of malformed or incorrectly encoded fragments / excerpts #70

Open
dvglc opened this issue Sep 27, 2019 · 2 comments
Assignees
Labels
bug Something isn't working needs-testing issue may be solved, but some testing has to confirm this search Issues concerning the search engine/interface
Milestone

Comments

@dvglc
Copy link
Contributor

dvglc commented Sep 27, 2019

We currently have two workarounds, in sphinx:highlight() and sphinx:excerpts(), for dealing with malformed or incorrectly encoded fragments / excerpts returned by Sphinx:

if (validation:jaxp(util:base64-decode($response//httpclient:body), false())) then

if (validation:jaxp(util:base64-decode($response//httpclient:body), false())) then

Whenever a malformed / badly encoded text occurs, it is simply filtered out. This seems to successfully suppress ugly server errors, but we should find out why (and when) Sphinx produces erroneous fragments/excerpts in the first place.

Interestingly, if one tries to prevent such errors before sending excerpts to Sphinx, e.g. by handcrafted string replacings of suspicious characters, like so:

replace(replace($requestDoc, '%3D', '='), '%26amp%3B', '&'),

Sphinx' highlighting of qery terms in excerpts doesn't work in all cases.

(See also: 7776af3 and 03a62e3)

@dvglc dvglc self-assigned this Sep 27, 2019
@dvglc dvglc added the bug Something isn't working label Sep 27, 2019
@dvglc dvglc added this to the v1.6 milestone Sep 27, 2019
@dvglc dvglc changed the title Fix Sphinx' returning of malformed or incorrectly encoded snippets / excerpts Fix Sphinx' returning of malformed or incorrectly encoded fragments / excerpts Sep 27, 2019
@dvglc
Copy link
Contributor Author

dvglc commented Sep 27, 2019

Although the workaround is the same in both cases, perhaps the malformed fragments in sphinx:highlight are really something different than the erroneously encoded excerpts in sphinx:excerpts. But perhaps the underlying issue is the same.

@dvglc
Copy link
Contributor Author

dvglc commented Feb 5, 2020

This might have been resolved through the latest changes with the EXPath Http Client, but needs further testing/observation.

@awagner-mainz awagner-mainz added the needs-testing issue may be solved, but some testing has to confirm this label Feb 19, 2020
@dvglc dvglc modified the milestones: v1.6, v2.1 Mar 3, 2020
@awagner-mainz awagner-mainz added the search Issues concerning the search engine/interface label Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-testing issue may be solved, but some testing has to confirm this search Issues concerning the search engine/interface
Projects
None yet
Development

No branches or pull requests

2 participants