Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datafit with log det datafit #265

Open
QB3 opened this issue Jul 5, 2024 · 3 comments
Open

Add datafit with log det datafit #265

QB3 opened this issue Jul 5, 2024 · 3 comments

Comments

@QB3
Copy link
Collaborator

QB3 commented Jul 5, 2024

Description of the feature

Handle $\log\det $-like datafit and allow variables with multiple roles.

Additional context

@bgirault-inria is currently solving (simpler) graphical-Lasso-like problems and could potentially benefit from our solvers.

The solved problem is
$$\min_{w \in \mathbb{R}^p, q \in \mathbb{R}^m} \log \det (\mathrm{diag}(q) + \sum_e w_e b_e b_e^T) + A w + B q + \lambda \sum_e w_e + \sum_e i_{w_e \geq 0} + \sum_i i_{q_i \geq 0} $$

with given $A, B, b_e$.

This optimization problem raises two main challenges for our current implementation:

  • the $\log \det $ derivative would require to store additional quantity for an efficient rank one update
  • there are two variables $w$ and $q$, with 2 different roles.
@bgirault-inria
Copy link

The overall cost optimization problem can be rewritten in the following form to have a single set of variables:

$$\min_{\mathbf{x}\in\mathbb{R}^p} -\log\det\left(\mathbf{C} + \sum_i x_i \mathbf{a}^{(i)}{\mathbf{a}^{(i)}}^T \right) + \sum_i \rho_i x_i + \sum_i \mathbb{I}_{x_i\geqslant 0} \text{,}$$

where $\forall i,\rho_i>0$.

Adding an $\ell_1$ regularization on a subset of the variables with the term $\lambda \sum_{i\in \mathcal{E}} x_i$ is then equivalent to adding $\lambda$ to all parameters $\rho_i$ with $i\in\mathcal{E}$ (where $\mathcal{E}$ is a subset of variable indices), such that the form of the problem is unaltered and we avoid having different roles for the variables (@QB3 2nd bullet point)

Above, the $\log\det$ term is actually a regularization term (penalty), while $\rho_ix_i$ is a data fitting term. If necessary, with a simple variable scaling of $\mathbf{x}$, we can swap the two interpretations, and get an equivalent $\ell_1$ regularized minimization problem:

$$\min_{\mathbf{\tilde x}\in\mathbb{R}^p} -\log\det\left(\mathbf{C} + \sum_i \tilde x_i \mathbf{\tilde a}^{(i)}{\mathbf{\tilde a}^{(i)}}^T \right) + ||\mathbf{\tilde x}||_{1} + \sum_i \mathbb{I}_{{\tilde x}_i \geqslant 0} \text{.}$$

Interestingly, with either form of the problem, and a coordinate descent, the corresponding minimization problem has a closed form solution for a single variable update, and does not require computing a gradient (or Hessian, which is basically free to compute from the gradient). However, whether we compute the gradient and Hessian or the closed form optimum, we need to keep in memory a matrix for their computations and efficient updates, and update it each time the solution is updated (@QB3 1st bullet point).

Would this be possible with skglm?

@mathurinm
Copy link
Collaborator

mathurinm commented Jul 8, 2024

@bgirault-inria is the matrix to store the inverse of $C + \sum x_i a_i a_i^\top$ ? That you can update with Shermann-Morrison after a change of a single coordinate of $x$.

Do you have a ref for the exact coordinate update ? because even in dimension 1, I don't see a closed form for $\min - \log(c + a^2 x) + \lambda |x|$ subject to $x \geq 0$, OK you just have to compute the unconstrained solution and if it does not lie in R+, then 0 is the solution

I doubt it directly fits the API of skglm (that is oriented towards glm); but the skglm algorithm (working sets + Anderson acceleration) can definitely be applied to it, and we can code it!

@bgirault-inria
Copy link

bgirault-inria commented Jul 8, 2024

@bgirault-inria is the matrix to store the inverse of ...?

Yes, it is. Let's denote it $\Phi$. The optimal update is then $x_i \leftarrow \max\left(0, x_i + \rho_i^{-1} - \left(\mathbf{a^{(i)}}^T\Phi\mathbf{a^{(i)}}\right)^{-1}\right)$

References:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants