Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[importExternalReference] Bypasses certain security measures when generating a PDF or MD #2761

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Megafredo
Copy link
Member

@Megafredo Megafredo commented Oct 4, 2024

Proposed changes

  • Adding more selectors for cookies consent
  • If click cookie consent fails, we try to hide the unwanted elements
  • Refacto process_playwright + Create a new browser context with a specified user agent to simulate a particular browser
  • Remove useless variable env

Related issues

Checklist

  • I consider the submitted work as finished
  • I tested the code for its functionality using different use cases
  • I added/update the relevant documentation (either on github or on notion)
  • Where necessary I refactored code to improve the overall quality

Further comments

@Megafredo Megafredo added bug use for describing something not working as expected filigran team use to identify PR from the Filigran team labels Oct 4, 2024
@Megafredo Megafredo added this to the Bugs backlog milestone Oct 4, 2024
@Megafredo Megafredo self-assigned this Oct 4, 2024
@SamuelHassine
Copy link
Member

@Megafredo @helene-nguyen can we also put a user agent more standard using Playwright like the one we have in the previous version of the connector (Mozilla ...)?

Copy link
Contributor

@flavienSindou flavienSindou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍
Just a remark on error handling.

Comment on lines 86 to 87
except:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could only except playwright error here and log message with debug level.

https://playwright.dev/python/docs/api/class-error [consulted on October 7th, 2024]

pass
if found:
page.wait_for_timeout(2000)
browser, page = self._process_playwright(p, url_to_import)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice code splitting 👍.

Copy link
Contributor

@flavienSindou flavienSindou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks ok to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug use for describing something not working as expected filigran team use to identify PR from the Filigran team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ImportExternalReference] BleepingComputer not importable due to Cloudflare protection
3 participants