Skip to content

Commit

Permalink
matrix completion documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
antagomir committed Feb 12, 2024
1 parent a28cd54 commit 2fa6867
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 51 deletions.
8 changes: 7 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,17 @@ Authors@R: c(person("Jari", "Oksanen", role=c("aut","cre"),
person("Cajo J.F.", "Ter Braak", role="aut"),
person("James", "Weedon", role="aut"))
Depends: permute (>= 0.9-0), lattice, R (>= 3.6.0)
Suggests: parallel, tcltk, knitr, markdown
Suggests:
parallel,
tcltk,
knitr,
markdown,
testthat (>= 3.0.0)
Imports: MASS, cluster, mgcv
VignetteBuilder: utils, knitr
Description: Ordination methods, diversity analysis and other
functions for community and vegetation ecologists.
License: GPL-2
BugReports: https://github.com/vegandevs/vegan/issues
URL: https://github.com/vegandevs/vegan
Config/testthat/edition: 3
78 changes: 39 additions & 39 deletions R/decostand.R
Original file line number Diff line number Diff line change
Expand Up @@ -188,10 +188,10 @@
stop("Some samples do not contain observations and rclr cannot be calculated.")
}

## Divide all values by their sample-wide geometric means
## Divide (or in log-space, reduce) all values by their sample-wide geometric means
xx <- clog - means

## If there were zeros, there are infinite values after logarithmic transform.
## zeros become infinite after log transform.
## Convert those to NA
xx[is.infinite(xx)] <- NA
attr(xx, "parameters") <- list("means" = means)
Expand Down Expand Up @@ -282,33 +282,36 @@



#' .OptSpace : an algorithm for matrix reconstruction from a partially revealed set
#' See the ROptSpace::OptSpace version 0.2.3 for detailed manpage.
#' Let's assume an ideal matrix \eqn{M} with \eqn{(m\times n)} entries with rank \eqn{r} and
#' we are given a partially observed matrix \eqn{M\_E} which contains many missing entries.
#' Matrix reconstruction - or completion - is the task of filling in such entries.
#' OptSpace is an efficient algorithm that reconstructs \eqn{M} from \eqn{|E|=O(rn)}
#' observed elements with relative root mean square error (RMSE)
#' \deqn{RMSE \le C(\alpha)\sqrt{nr/|E|}}
#'
#' @param A an \eqn{(n\times m)} matrix whose missing entries should be flaged as NA.
#' @param ropt \code{NA} to guess the rank, or a positive integer as a pre-defined rank.
#' @param niter maximum number of iterations allowed.
#' @param tol stopping criterion for reconstruction in Frobenius norm.
#' @param showprogress a logical value; \code{TRUE} to show progress, \code{FALSE} otherwise.
#'
#' @return a named list containing
#' \describe{
#' \item{X}{an \eqn{(n \times r)} matrix as left singular vectors.}
#' \item{S}{an \eqn{(r \times r)} matrix as singular values.}
#' \item{Y}{an \eqn{(m \times r)} matrix as right singular vectors.}
#' \item{dist}{a vector containing reconstruction errors at each successive iteration.}
#' }
#' @references
#' Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh (2010).
#' Matrix Completion From a Few Entries.
#' IEEE Transactions on Information Theory 56(6):2980--2998.
#'
# .OptSpace : an algorithm for matrix reconstruction from a partially revealed set
# This function has been adapted from the original source code in the ROptSpace R package
# (version 0.2.3) by
# Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh (2010).
# See the ROptSpace::OptSpace for more information.
# Let's assume an ideal matrix \eqn{M} with \eqn{(m\times n)} entries with rank \eqn{r} and
# we are given a partially observed matrix \eqn{M\_E} which contains many missing entries.
# Matrix reconstruction - or completion - is the task of filling in such entries.
# OptSpace is an efficient algorithm that reconstructs \eqn{M} from \eqn{|E|=O(rn)}
# observed elements with relative root mean square error (RMSE)
# \deqn{RMSE \le C(\alpha)\sqrt{nr/|E|}}
#
# @param A an \eqn{(n\times m)} matrix whose missing entries should be flaged as NA.
# @param ropt \code{NA} to guess the rank, or a positive integer as a pre-defined rank.
# @param niter maximum number of iterations allowed.
# @param tol stopping criterion for reconstruction in Frobenius norm.
# @param showprogress a logical value; \code{TRUE} to show progress, \code{FALSE} otherwise.
#
# @return a named list containing
# \describe{
# \item{X}{an \eqn{(n \times r)} matrix as left singular vectors.}
# \item{S}{an \eqn{(r \times r)} matrix as singular values.}
# \item{Y}{an \eqn{(m \times r)} matrix as right singular vectors.}
# \item{dist}{a vector containing reconstruction errors at each successive iteration.}
# }
# @references
# Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh (2010).
# Matrix Completion From a Few Entries.
# IEEE Transactions on Information Theory 56(6):2980--2998.
#
.OptSpace <- function(A, ropt=NA, niter=50, tol=1e-6, showprogress=FALSE){
## Preprocessing : A : partially revelaed matrix
if (!is.matrix(A)){
Expand Down Expand Up @@ -428,9 +431,6 @@

# compute the distortion
dist[i+1] = norm(((M_E - X%*%S%*%t(Y))*E),'f')/sqrt(nnZ.E)
#if (showprogress){
# pmsg=sprintf('* .OptSpace: Step 4: Iteration %d: distortion: %e',i,dist[i+1])
#}

if (dist[i+1]<tol){
dist = dist[1:(i+1)]
Expand All @@ -453,7 +453,7 @@
}


#' @keywords internal
# @keywords internal
.guess_rank <- function(X,nnz){
maxiter = 10000
n = nrow(X)
Expand Down Expand Up @@ -519,7 +519,7 @@


# Aux 2 : compute the distortion ------------------------------------------
#' @keywords internal
# @keywords internal
.aux_G <- function(X,m0,r){
z = rowSums(X^2)/(2*m0*r)
y = exp((z-1)^2) - 1
Expand All @@ -528,7 +528,7 @@
out = sum(y)
return(out)
}
#' @keywords internal
# @keywords internal
.aux_F_t <- function(X,Y,S,M_E,E,m0,rho){
n = nrow(X)
r = ncol(X)
Expand All @@ -542,7 +542,7 @@


# Aux 3 : compute the gradient --------------------------------------------
#' @keywords internal
# @keywords internal
.aux_Gp <- function(X,m0,r){
z = rowSums(X^2)/(2*m0*r)
z = 2*exp((z-1)^2)/(z-1)
Expand All @@ -551,7 +551,7 @@

out = (X*matrix(z,nrow=nrow(X),ncol=ncol(X),byrow=FALSE))/(m0*r)
}
#' @keywords internal
# @keywords internal
.aux_gradF_t <- function(X,Y,S,M_E,E,m0,rho){
n = nrow(X)
r = ncol(X)
Expand All @@ -578,7 +578,7 @@


# Aux 4 : Sopt given X and Y ----------------------------------------------
#' @keywords internal
# @keywords internal
.aux_getoptS <- function(X,Y,M_E,E){
n = nrow(X)
r = ncol(X)
Expand All @@ -604,7 +604,7 @@
}

# Aux 5 : optimal line search ---------------------------------------------
#' @keywords internal
# @keywords internal
.aux_getoptT <- function(X,W,Y,Z,S,M_E,E,m0,rho){
norm2WZ = (norm(W,'f')^2)+(norm(Z,'f')^2)
f = array(0,c(1,21))
Expand Down
24 changes: 13 additions & 11 deletions man/decostand.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -109,24 +109,26 @@ decobackstand(x, zap = TRUE)
the \code{rclr} method for one available solution.

\item \code{rclr}: robust clr ("rclr") is similar to regular clr
(see above) but allows data that contains zeroes. This method
does not use pseudocounts, unlike the standard clr.
The robust clr (rclr) divides the values by geometric mean
(see above) but allows data with zeroes. This method can avoid the use of pseudocounts,
unlike the standard clr. The robust clr (rclr) divides the values by geometric mean
of the observed features and then performs matrix completion for the zero entries.
In high dimensional data,
the geometric mean of rclr approximates the true
In high dimensional data the geometric mean of rclr approximates the true
geometric mean; see e.g. Martino et al. (2019)
The \code{rclr} transformation is defined formally as follows:
\deqn{rclr = log\frac{x}{g(x > 0)}}{%
rclr = log(x/g(x > 0))}
where \eqn{x} is a single value, and \eqn{g(x > 0)} is the geometric
mean of sample-wide values \eqn{x} that are positive (> 0). The OptSpace algorithm us
used for matrix completion for the missing values that result from log transformation of
the zero entries in the original input data. The vegan implementation in vegan is
modified from the original implementation ROptSpace::OptSpace version 0.2.3 following
Keshavan et al. (2010).
mean of sample-wide values \eqn{x} that are positive (> 0). The OptSpace algorithm is
used for matrix completion of the missing values that result from log transformation of
the zero entries in the original input data. The vegan implementation has been
modified from the original implementation ROptSpace::OptSpace (version 0.2.3) following
Keshavan et al. (2010). The following parameters can be passed to OptSpace through decostand:
"ropt" NA to guess the rank, or a positive integer as a pre-defined rank (default: NA);
"niter" maximum number of iterations allowed (default: 50);
"tol" stopping criterion for reconstruction in Frobenius norm (default: 1e-6);
"showprogress" a logical value; TRUE to show progress, FALSE otherwise (default: FALSE).
}

Standardization, as contrasted to transformation, means that the
entries are transformed relative to other entries.

Expand Down

0 comments on commit 2fa6867

Please sign in to comment.