Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added transactional caches to external Infinispan #737

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pruivo
Copy link
Contributor

@pruivo pruivo commented Mar 13, 2024

  • CROSS_DC_TX_2PC=true|false configures the external Infinispan caches transactions (disabled by default)

Other changes include

  • Use 'dataset-import.sh' script in dataset-import-* tasks
  • Update ec2-run to match the nightly job
  • Automatic take-offline for cross-site enabled by default with 30 sec minimum wait time before switching

* CROSS_DC_TX_2PC=true|false configures the external Infinispan caches
  transactions (disabled by default)

Other changes include

* Use 'dataset-import.sh' script in dataset-import-* tasks
* Update ec2-run to match the nightly job
* Automatic take-offline for cross-site enabled by default with 30 sec
  minimum wait time before switching

Signed-off-by: Pedro Ruivo <[email protected]>
@mhajas
Copy link
Contributor

mhajas commented Mar 13, 2024

Nice work @pruivo! Is this solving issue: #730. If you haven't tested this yet I can try to run the functional test against this change tomorrow.

@pruivo
Copy link
Contributor Author

pruivo commented Mar 13, 2024

I did not run any testing. Is your A/A test already merged in the main branch?

@kami619
Copy link
Contributor

kami619 commented Mar 15, 2024

Do we have an Active/Active functional test available in a branch somewhere ? @pruivo, @mhajas ?

@pruivo
Copy link
Contributor Author

pruivo commented Mar 15, 2024

We have the deadlock reproducer here: https://github.com/mhajas/keycloak-benchmark/tree/active-active-deadlock-reproducer

@mhajas
Copy link
Contributor

mhajas commented Mar 18, 2024

@pruivo I tested this today and I can still reproduce the failure. Also, the branch you linked is incorrect. Here is a detailed description of what is happening and how to reproduce: #730

I will investigate this more after lunch.

@mhajas
Copy link
Contributor

mhajas commented Mar 18, 2024

I can confirm the behavior described here: #730 is still reproducible with this PR. My rosa-cross-dc config:

# GENERIC
TIMEOUT=1200 # 20 minutes in seconds - 300 may be too low if we need to wait for machines to be provisioned
ROSA_CLUSTER_NAME_1=gh-keycloak-a
ROSA_CLUSTER_NAME_2=gh-keycloak-b

# AURORA
AURORA_CLUSTER=gh-keycloak

# ROUTE53
           KC_CLIENT_URL=client.url
KC_HEALTH_URL_CLUSTER_1=primary.url
 KC_HEALTH_URL_CLUSTER_2=backup.url

# INFINISPAN
KC_ISPN_NAMESPACE=mhajas-keycloak
CROSS_DC_MODE=SYNC
CROSS_DC_TX_2PC=true

# KEYCLOAK
KC_NAMESPACE_PREFIX='mhajas-'
KC_DATABASE=aurora-postgres
KC_DB_POOL_MIN_SIZE=15
KC_DB_POOL_MAX_SIZE=15
KC_DB_POOL_INITIAL_SIZE=15
KC_STORAGE=
KC_INSTANCES=1
KC_DISABLE_STICKY_SESSION='true'
KC_MEMORY_REQUESTS_MB=2048
KC_MEMORY_LIMITS_MB=3000
KC_HEAP_MAX_MB=2048
KC_HEAP_INIT_MB=1024
KC_CPU_REQUESTS=8
KC_CPU_LIMITS=8
KC_OTEL='true'
KC_CUSTOM_INFINISPAN_CONFIG_FILE=config/kcb-infinispan-cache-remote-store-config.xml

also, here is session cache config from my deployment:

spec:
  clusterName: infinispan
  name: sessions
  template: |-
    distributedCache:
      mode: "SYNC"
      owners: "2"
      statistics: "true"
      remoteTimeout: "5000"
      locking:
        acquireTimeout: "4000"
      transaction:
        mode: "NON_XA"
        locking: "OPTIMISTIC"
      stateTransfer:
        chunkSize: "16"
      backups:
        mergePolicy: "ALWAYS_REMOVE" # <1>
        gh-keycloak-a: # <2>
          backup:
            strategy: "SYNC" # <3>
            timeout: "4500"
            failurePolicy: "WARN"
            twoPhaseCommit: "true"
            stateTransfer:
              chunkSize: "16"
            takeOffline:
              minWait: "30000"
              afterFailures: "-1"

@pruivo
Copy link
Contributor Author

pruivo commented Mar 18, 2024

Do you mean deadlocks? Yes, those can still happen.

@ahus1
Copy link
Contributor

ahus1 commented Mar 18, 2024

I suppose wanted to try failurePolicy: "FAIL" as only then Keycloak will see that something has failed to update in the remote DC and then retry, and otherwise the embedded cache will be out of date?

@pruivo
Copy link
Contributor Author

pruivo commented Mar 18, 2024

@mhajas can you point me to the test that you are running?

@mhajas
Copy link
Contributor

mhajas commented Mar 18, 2024

@pruivo It is described here: #730 (comment)

There is also a How to Reproduce? section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants