Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing support for kubevirt #1213

Merged
merged 14 commits into from
Mar 13, 2024
Merged

Testing support for kubevirt #1213

merged 14 commits into from
Mar 13, 2024

Conversation

nirs
Copy link
Member

@nirs nirs commented Feb 22, 2024

Additional changes required for kubevirt support.

  • Add PVC based VM sample configuration
  • Add systemd service for the registry
  • How to use a local registry for minikube clusters
  • How to configure local git server
  • Extend CDI certificate lifetime to 1 week
  • Extend kubvirt certificate lifetime to 1 week
  • Allow pulling from local insecure registry
  • Add drenv suspend and resume commands
  • Add drenv --skip-addons option
  • Trim kubevirt environment
  • Fix waiting for CDI
  • Use the standard cirros image in CDI test
  • Update kubvirt to latest release
  • Upgrade CDI to latest release

Status:

  • test with the previous kubevirt/cdi version
  • test with the new vm-pvc from ocm-ramen-samples

@nirs nirs mentioned this pull request Feb 22, 2024
@nirs nirs force-pushed the test-kubevirt branch 3 times, most recently from 320f68c to 32c7ee8 Compare March 5, 2024 13:15
@nirs nirs force-pushed the test-kubevirt branch 2 times, most recently from 6ecad7e to 3481a07 Compare March 11, 2024 13:29
@nirs nirs marked this pull request as ready for review March 11, 2024 14:13
@nirs nirs mentioned this pull request Mar 11, 2024
@nirs nirs marked this pull request as draft March 13, 2024 16:48
@nirs nirs marked this pull request as ready for review March 13, 2024 18:13
@nirs
Copy link
Member Author

nirs commented Mar 13, 2024

Example run

$ basic-test/run -c configs/kubevirt/vm-pvc-k8s-regional.yaml envs/regional-dr-kubevirt.yaml 
2024-03-13 20:23:52,164 INFO    [deploy] Deploying application
2024-03-13 20:23:52,165 INFO    [deploy] Deploying application 'vm-pvc'
2024-03-13 20:23:53,403 INFO    [deploy] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2024-03-13 20:23:53,632 INFO    [deploy] Application running on cluster 'dr1'
2024-03-13 20:23:53,777 INFO    [enable-dr] Enable DR
2024-03-13 20:23:53,840 INFO    [enable-dr] Disabling OCM scheduling for 'placement.cluster.open-cluster-management.io/placement'
2024-03-13 20:23:53,994 INFO    [enable-dr] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2024-03-13 20:23:54,405 INFO    [enable-dr] waiting for namespace vm-pvc
2024-03-13 20:23:54,530 INFO    [enable-dr] Waiting until 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' reports status
2024-03-13 20:23:55,108 INFO    [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' Available condition
2024-03-13 20:23:55,312 INFO    [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' PeerReady condition
2024-03-13 20:23:55,512 INFO    [enable-dr] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' first replication
2024-03-13 20:25:24,134 INFO    [enable-dr] DR enabled
2024-03-13 20:25:24,289 INFO    [failover] Fail over application
2024-03-13 20:25:24,356 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' Available condition
2024-03-13 20:25:24,535 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' PeerReady condition
2024-03-13 20:25:24,717 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' first replication
2024-03-13 20:25:24,851 INFO    [failover] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2024-03-13 20:25:25,131 INFO    [failover] Starting failover for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' to cluster 'dr2'
2024-03-13 20:25:25,294 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' Available condition
2024-03-13 20:25:54,293 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' PeerReady condition
2024-03-13 20:29:24,245 INFO    [failover] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' first replication
2024-03-13 20:29:24,317 INFO    [failover] Application was failed over
2024-03-13 20:29:24,480 INFO    [relocate] Relocate application
2024-03-13 20:29:24,536 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' Available condition
2024-03-13 20:29:24,708 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' PeerReady condition
2024-03-13 20:29:24,876 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' first replication
2024-03-13 20:29:25,008 INFO    [relocate] Waiting for 'placement.cluster.open-cluster-management.io/placement' decisions
2024-03-13 20:29:25,291 INFO    [relocate] Starting relocate for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' to cluster 'dr1'
2024-03-13 20:29:25,446 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' phase 'Relocated'
2024-03-13 20:31:24,262 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' Available condition
2024-03-13 20:31:24,446 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' PeerReady condition
2024-03-13 20:31:54,256 INFO    [relocate] Waiting for 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc' first replication
2024-03-13 20:31:54,337 INFO    [relocate] Application was relocated
2024-03-13 20:31:54,477 INFO    [disable-dr] Disable DR
2024-03-13 20:31:54,537 INFO    [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/vm-pvc-drpc'
2024-03-13 20:32:24,202 INFO    [disable-dr] Enabling OCM scheduling for 'placement.cluster.open-cluster-management.io/placement'
2024-03-13 20:32:24,300 INFO    [disable-dr] DR was disabled
2024-03-13 20:32:24,458 INFO    [undeploy] Deleting application
2024-03-13 20:32:24,458 INFO    [undeploy] Undeploying application 'vm-pvc'
2024-03-13 20:32:30,689 INFO    [undeploy] Application was deleted

nirs added 13 commits March 13, 2024 21:46
There is no point in using tow versions of the same image. Using this
image in the CDI test can save time in the kubvirt tests later, using
the cached image.

Signed-off-by: Nir Soffer <[email protected]>
CDI may become available before it is ready to use. If we try to use it
while it is progressing we may fail with errors about missing CRDS. Wait
until the progressing condition becomes false.

Example run showing the issue:

    2024-01-10 21:42:24,080 DEBUG   [kubevirt/1] Deploying cdi cr
    2024-01-10 21:42:25,674 DEBUG   [kubevirt/1] Waiting until cdi cr is available
    2024-01-10 21:42:26,005 DEBUG   [kubevirt/1] cdi.cdi.kubevirt.io/cdi condition met

We stopped waiting here...

    2024-01-10 21:42:26,007 DEBUG   [kubevirt/1] Waiting until cdi cr finished progressing
    2024-01-10 21:42:39,472 DEBUG   [kubevirt/1] cdi.cdi.kubevirt.io/cdi condition met

But CDI finished progressing 13 seconds later.

Signed-off-by: Nir Soffer <[email protected]>
We cannot use volsync with ramen yet, and the kubevirt environment is
already too big. Without volsync we can remove the volumesnapshot addon
and submariner, which does not handle well suspending of the machine
running the minikube VMs.

With this change we should be able to start an environment, suspend the
laptop, and resume it in an environment with unreliable network or no
network access. This will be useful for live demo in conferences.

Keep volsync enabled in `regional-dr` and `regional-dr-hubless` to keep
the submariner and volsync addons functional.

Signed-off-by: Nir Soffer <[email protected]>
This is useful for starting a stopped working environment quickly
without trying to redeploy everything. The main motivation is using a
pre created environment in location with weak network like a conference.

Other use cases are working around bugs in addons that do not work well
when starting a stopped cluster, for example clusteradm.

With `--skip-addons` we skip the `start` and `stop` hooks, but we do run
the `test` hooks. This is useful for starting a stopped environment
faster but testing that the environment works. To skip all hooks run
with both `--skip-addons` and `--skip-tests`.

Example run:

    $ drenv start --skip-addons --skip-tests $env
    2023-11-20 00:59:25,341 INFO    [rdr-kubevirt] Starting environment
    2023-11-20 00:59:25,464 INFO    [dr1] Starting minikube cluster
    2023-11-20 00:59:29,566 INFO    [hub] Starting minikube cluster
    2023-11-20 00:59:29,578 INFO    [dr2] Starting minikube cluster
    2023-11-20 01:00:23,402 INFO    [dr1] Cluster started in 57.94 seconds
    2023-11-20 01:00:23,402 INFO    [dr1] Configuring containerd
    2023-11-20 01:00:24,936 INFO    [dr1] Waiting until all deployments are available
    2023-11-20 01:00:28,749 INFO    [hub] Cluster started in 59.18 seconds
    2023-11-20 01:00:28,750 INFO    [hub] Waiting until all deployments are available
    2023-11-20 01:00:53,834 INFO    [dr2] Cluster started in 84.26 seconds
    2023-11-20 01:00:53,834 INFO    [dr2] Configuring containerd
    2023-11-20 01:00:55,042 INFO    [dr2] Waiting until all deployments are available
    2023-11-20 01:01:01,063 INFO    [hub] Deployments are available in 32.31 seconds
    2023-11-20 01:01:09,482 INFO    [dr1] Deployments are available in 44.55 seconds
    2023-11-20 01:01:34,661 INFO    [dr2] Deployments are available in 39.62 seconds
    2023-11-20 01:01:34,661 INFO    [rdr-kubevirt] Dumping ramen e2e config to '/home/nsoffer/.config/drenv/rdr-kubevirt'
    2023-11-20 01:01:34,827 INFO    [rdr-kubevirt] Environment started in 129.49 seconds

Signed-off-by: Nir Soffer <[email protected]>
Suspend or resume the underlying virtual machines. Assuming kvm2 driver
to keep it simple for now, need to implement it better later so it works
also with qemu2 driver.

The use case is building the environment with good network, suspending
it, and resuming it in an environment with flaky network for demo.

Signed-off-by: Nir Soffer <[email protected]>
Configure CDI to allow pulling from a local insecure registry. This is
useful for demos in an environment with unreliable network, or for CI
environment when we want to avoid random failures due to flaky network.

The image must be pushed to the local registry, this is easy using
standard podman push command.

Signed-off-by: Nir Soffer <[email protected]>
To avoid certificate renewals during testing.

Without this I experienced this error:

    drenv.commands.Error: Command failed:
       command: ('kubectl', 'apply', '--context', 'dr1', '--kustomize=cr')
       exitcode: 1
       error:
          Error from server (InternalError): error when applying patch:
          {"spec":{"configuration":{"developerConfiguration":{"featureGates":[]}}}}
          to:
          Resource: "kubevirt.io/v1, Resource=kubevirts", GroupVersionKind: "kubevirt.io/v1, Kind=KubeVirt"
          Name: "kubevirt", Namespace: "kubevirt"
          for: "cr": error when patching "cr": Internal error occurred: failed calling webhook
               "kubevirt-update-validator.kubevirt.io": failed to call webhook: Post
               "https://kubevirt-operator-webhook.kubevirt.svc:443/kubevirt-validate-update?timeout=10s":
               tls: failed to verify certificate: x509: certificate has expired or is not yet valid:
               current time 2024-01-26T19:05:52Z is after 2024-01-26T16:24:46Z

Thanks: Michael Henriksen <[email protected]>
Signed-off-by: Nir Soffer <[email protected]>
To avoid certificate renewals during testing.

Without this I experienced this error when starting a stopped
environment after a day:

   drenv.commands.Error: Command failed:
      command: ('kubectl', 'apply', '--context', 'dr2', '--kustomize=disk')
      exitcode: 1
      error:
         Error from server (InternalError): error when creating "disk": Internal
         error occurred: failed calling webhook "populator-validate.cdi.kubevirt.io":
         failed to call webhook: Post "https://cdi-api.cdi.svc:443/populator-validate?timeout=30s":
         tls: failed to verify certificate: x509: certificate has expired or is not yet valid:
         current time 2024-01-28T14:08:01Z is after 2024-01-27T19:15:20Z

Signed-off-by: Nir Soffer <[email protected]>
Using a local git server we can deploy ocm applications without network
access to github. This is useful for demos when the network is
unreliable, for example in a conference.

Signed-off-by: Nir Soffer <[email protected]>
Using local registry is useful for demos when network is unreliable, for
example in a conference. It can also be used to avoid random failures
when the network is flaky, by caching remove images locally.

Signed-off-by: Nir Soffer <[email protected]>
With this you can run the local registry as a systemd service starting
at boot, instead of starting the registry manually when you want to use
it.

Signed-off-by: Nir Soffer <[email protected]>
Since we plan multiple configuration for kubevirt, we use the same
layout as ocm-ramen-samples subscription/ directory:

    configs/
    ├── deployment-k8s-regional-rbd.yaml
    └── kubevirt
        └── vm-pvc-k8s-regional.yaml

To run basic tests using a vm use:

    basic-test/run -c configs/kubevirt/vm-pvc-k8s-regional.yaml $env

Signed-off-by: Nir Soffer <[email protected]>
@ShyamsundarR ShyamsundarR merged commit 7a245df into RamenDR:main Mar 13, 2024
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants