matrix completion documentation

vegandevs · Feb 12, 2024 · 2fa6867 · 2fa6867
1 parent a28cd54
commit 2fa6867
Show file tree

Hide file tree

Showing 3 changed files with 59 additions and 51 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -37,11 +37,17 @@ Authors@R: c(person("Jari", "Oksanen", role=c("aut","cre"),
 	   person("Cajo J.F.", "Ter Braak", role="aut"),
 	   person("James", "Weedon", role="aut"))
 Depends: permute (>= 0.9-0), lattice, R (>= 3.6.0)
-Suggests: parallel, tcltk, knitr, markdown
+Suggests: 
+    parallel,
+    tcltk,
+    knitr,
+    markdown,
+    testthat (>= 3.0.0)
 Imports: MASS, cluster, mgcv
 VignetteBuilder: utils, knitr
 Description: Ordination methods, diversity analysis and other
   functions for community and vegetation ecologists.
 License: GPL-2
 BugReports: https://github.com/vegandevs/vegan/issues
 URL: https://github.com/vegandevs/vegan
+Config/testthat/edition: 3
diff --git a/R/decostand.R b/R/decostand.R
@@ -188,10 +188,10 @@
      stop("Some samples do not contain observations and rclr cannot be calculated.")
    }
 
-   ## Divide all values by their sample-wide geometric means
+   ## Divide (or in log-space, reduce) all values by their sample-wide geometric means
    xx <- clog - means
 
-   ## If there were zeros, there are infinite values after logarithmic transform.
+   ## zeros become infinite after log transform.
    ## Convert those to NA
    xx[is.infinite(xx)] <- NA
    attr(xx, "parameters") <- list("means" = means)
@@ -282,33 +282,36 @@
 
 
 
-#' .OptSpace : an algorithm for matrix reconstruction from a partially revealed set
-#' See the ROptSpace::OptSpace version 0.2.3 for detailed manpage.
-#' Let's assume an ideal matrix \eqn{M} with \eqn{(m\times n)} entries with rank \eqn{r} and
-#' we are given a partially observed matrix \eqn{M\_E} which contains many missing entries.
-#' Matrix reconstruction - or completion - is the task of filling in such entries.
-#' OptSpace is an efficient algorithm that reconstructs \eqn{M} from \eqn{|E|=O(rn)}
-#' observed elements with relative root mean square error (RMSE)
-#' \deqn{RMSE \le C(\alpha)\sqrt{nr/|E|}}
-#'
-#' @param A an \eqn{(n\times m)} matrix whose missing entries should be flaged as NA.
-#' @param ropt \code{NA} to guess the rank, or a positive integer as a pre-defined rank.
-#' @param niter maximum number of iterations allowed.
-#' @param tol stopping criterion for reconstruction in Frobenius norm.
-#' @param showprogress a logical value; \code{TRUE} to show progress, \code{FALSE} otherwise.
-#'
-#' @return a named list containing
-#' \describe{
-#' \item{X}{an \eqn{(n \times r)} matrix as left singular vectors.}
-#' \item{S}{an \eqn{(r \times r)} matrix as singular values.}
-#' \item{Y}{an \eqn{(m \times r)} matrix as right singular vectors.}
-#' \item{dist}{a vector containing reconstruction errors at each successive iteration.}
-#' }
-#' @references
-#' Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh (2010).
-#' Matrix Completion From a Few Entries.
-#' IEEE Transactions on Information Theory 56(6):2980--2998.
-#'
+# .OptSpace : an algorithm for matrix reconstruction from a partially revealed set
+# This function has been adapted from the original source code in the ROptSpace R package
+# (version 0.2.3) by
+# Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh (2010).
+# See the ROptSpace::OptSpace for more information.
+# Let's assume an ideal matrix \eqn{M} with \eqn{(m\times n)} entries with rank \eqn{r} and
+# we are given a partially observed matrix \eqn{M\_E} which contains many missing entries.
+# Matrix reconstruction - or completion - is the task of filling in such entries.
+# OptSpace is an efficient algorithm that reconstructs \eqn{M} from \eqn{|E|=O(rn)}
+# observed elements with relative root mean square error (RMSE)
+# \deqn{RMSE \le C(\alpha)\sqrt{nr/|E|}}
+#
+# @param A an \eqn{(n\times m)} matrix whose missing entries should be flaged as NA.
+# @param ropt \code{NA} to guess the rank, or a positive integer as a pre-defined rank.
+# @param niter maximum number of iterations allowed.
+# @param tol stopping criterion for reconstruction in Frobenius norm.
+# @param showprogress a logical value; \code{TRUE} to show progress, \code{FALSE} otherwise.
+#
+# @return a named list containing
+# \describe{
+# \item{X}{an \eqn{(n \times r)} matrix as left singular vectors.}
+# \item{S}{an \eqn{(r \times r)} matrix as singular values.}
+# \item{Y}{an \eqn{(m \times r)} matrix as right singular vectors.}
+# \item{dist}{a vector containing reconstruction errors at each successive iteration.}
+# }
+# @references
+# Raghunandan H. Keshavan, Andrea Montanari, Sewoong Oh (2010).
+# Matrix Completion From a Few Entries.
+# IEEE Transactions on Information Theory 56(6):2980--2998.
+#
 .OptSpace <- function(A, ropt=NA, niter=50, tol=1e-6, showprogress=FALSE){
   ## Preprocessing : A     : partially revelaed matrix
   if (!is.matrix(A)){
@@ -428,9 +431,6 @@
 
     # compute the distortion
     dist[i+1] = norm(((M_E - X%*%S%*%t(Y))*E),'f')/sqrt(nnZ.E)
-    #if (showprogress){
-    #  pmsg=sprintf('* .OptSpace: Step 4: Iteration %d: distortion: %e',i,dist[i+1])
-    #}
 
     if (dist[i+1]<tol){
       dist = dist[1:(i+1)]
@@ -453,7 +453,7 @@
 }
 
 
-#' @keywords internal
+# @keywords internal
 .guess_rank <- function(X,nnz){
   maxiter = 10000
   n = nrow(X)
@@ -519,7 +519,7 @@
 
 
 # Aux 2 : compute the distortion ------------------------------------------
-#' @keywords internal
+# @keywords internal
 .aux_G <- function(X,m0,r){
   z = rowSums(X^2)/(2*m0*r)
   y = exp((z-1)^2) - 1
@@ -528,7 +528,7 @@
   out = sum(y)
   return(out)
 }
-#' @keywords internal
+# @keywords internal
 .aux_F_t <- function(X,Y,S,M_E,E,m0,rho){
   n = nrow(X)
   r = ncol(X)
@@ -542,7 +542,7 @@
 
 
 # Aux 3 : compute the gradient --------------------------------------------
-#' @keywords internal
+# @keywords internal
 .aux_Gp <- function(X,m0,r){
   z = rowSums(X^2)/(2*m0*r)
   z = 2*exp((z-1)^2)/(z-1)
@@ -551,7 +551,7 @@
 
   out = (X*matrix(z,nrow=nrow(X),ncol=ncol(X),byrow=FALSE))/(m0*r)
 }
-#' @keywords internal
+# @keywords internal
 .aux_gradF_t <- function(X,Y,S,M_E,E,m0,rho){
   n = nrow(X)
   r = ncol(X)
@@ -578,7 +578,7 @@
 
 
 # Aux 4 : Sopt given X and Y ----------------------------------------------
-#' @keywords internal
+# @keywords internal
 .aux_getoptS <- function(X,Y,M_E,E){
   n = nrow(X)
   r = ncol(X)
@@ -604,7 +604,7 @@
 }
 
 # Aux 5 : optimal line search ---------------------------------------------
-#' @keywords internal
+# @keywords internal
 .aux_getoptT <- function(X,W,Y,Z,S,M_E,E,m0,rho){
   norm2WZ = (norm(W,'f')^2)+(norm(Z,'f')^2)
   f = array(0,c(1,21))

diff --git a/man/decostand.Rd b/man/decostand.Rd
@@ -109,24 +109,26 @@ decobackstand(x, zap = TRUE)
      the \code{rclr} method for one available solution.
 
    \item \code{rclr}: robust clr ("rclr") is similar to regular clr
-     (see above) but allows data that contains zeroes. This method
-     does not use pseudocounts, unlike the standard clr.
-     The robust clr (rclr) divides the values by geometric mean
+     (see above) but allows data with zeroes. This method can avoid the use of pseudocounts,
+     unlike the standard clr. The robust clr (rclr) divides the values by geometric mean
      of the observed features and then performs matrix completion for the zero entries.
-     In high dimensional data,
-     the geometric mean of rclr approximates the true
+     In high dimensional data the geometric mean of rclr approximates the true
      geometric mean; see e.g. Martino et al. (2019)
      The \code{rclr} transformation is defined formally as follows:
      \deqn{rclr = log\frac{x}{g(x > 0)}}{%
      rclr = log(x/g(x > 0))}
      where \eqn{x} is a single value, and \eqn{g(x > 0)} is the geometric 
-     mean of sample-wide values \eqn{x} that are positive (> 0). The OptSpace algorithm us
-     used for matrix completion for the missing values that result from log transformation of
-     the zero entries in the original input data. The vegan implementation in vegan is
-     modified from the original implementation ROptSpace::OptSpace version 0.2.3 following
-     Keshavan et al. (2010).
+     mean of sample-wide values \eqn{x} that are positive (> 0). The OptSpace algorithm is
+     used for matrix completion of the missing values that result from log transformation of
+     the zero entries in the original input data. The vegan  implementation has been
+     modified from the original implementation ROptSpace::OptSpace (version 0.2.3) following
+     Keshavan et al. (2010). The following parameters can be passed to OptSpace through decostand:
+     "ropt" NA to guess the rank, or a positive integer as a pre-defined rank (default: NA);
+     "niter" maximum number of iterations allowed (default: 50);
+     "tol" stopping criterion for reconstruction in Frobenius norm (default: 1e-6);
+     "showprogress" a logical value; TRUE to show progress, FALSE otherwise (default: FALSE).
   }
-  
+
   Standardization, as contrasted to transformation, means that the
   entries are transformed relative to other entries.