Title: | Big Quadratic Inverse Covariance Estimation |
---|---|
Description: | Use Newton's method, coordinate descent, and METIS clustering to solve the L1 regularized Gaussian MLE inverse covariance matrix estimation problem. |
Authors: | Khalid B. Kunji [aut, cre], Cho-Jui Hsieh [ctb], Matyas A. Sustik [ctb], Inderjit S. Dhillon [ctb], Pradeep Ravikumar [ctb], Tuo Zhao [ctb], Xingguo Li [ctb], Han Liu [ctb], Kathryn Roeder [ctb], John Lafferty [ctb], Larry Wasserman [ctb], George Karypis [ctb], Melissa O'Neill [ctb], Richard Henderson [ctb] |
Maintainer: | Khalid B. Kunji <[email protected]> |
License: | GPL (>= 3) | file LICENSE |
Version: | 1.1-13 |
Built: | 2024-11-26 04:06:53 UTC |
Source: | https://github.com/cran/BigQuic |
Use Newton's method, coordinate descent, and METIS clustering to solve the L1 regularized Gaussian MLE inverse covariance matrix estimation problem.
BigQuic (X = NULL, inputFileName = NULL, outputFileName = NULL, lambda = 0.5, numthreads = 4, maxit = 5, epsilon = 1e-3, k = 0, memory_size = 8000, verbose = 0, isnormalized = 1, seed = NULL, use_ram = FALSE)
BigQuic (X = NULL, inputFileName = NULL, outputFileName = NULL, lambda = 0.5, numthreads = 4, maxit = 5, epsilon = 1e-3, k = 0, memory_size = 8000, verbose = 0, isnormalized = 1, seed = NULL, use_ram = FALSE)
X |
An n rows by p columns matrix of the data without the response vector (Y). |
inputFileName |
Full path to a file containing X in the format used for input to BigQUIC, p n X An example is installed, you can get its path with: paste(path.package("BigQuic"), "/extdata/testInput", sep = "") |
outputFileName |
Location and name of output file that will be extrapolated for their naming, e.g. /home/username/test when 3 files are being output will result in /home/username/test.1.output /home/username/test.2.output and /home/username/test.3.output |
lambda |
The tuning parameter 0 <= lambda <= 1, but small values should not be used for performance reasons, e.g. < .4 or so. A vector of lambdas may also be input, in which case BigQUIC will be run for each lambda. Yes, the examples shows lambda as small as 0.1, but that is only because the testInput matrix is very small so the small lambdas can still finish in a sensible amount of time. |
numthreads |
Number of threads to use for this computation. |
maxit |
Maximum number of Newton iterations. |
epsilon |
Convergence tolerance. |
k |
Number of memory blocks to use, ideally should be the smallest k such that p/k columns fit in the memory_size. |
memory_size |
The amount of memory this computation is constrained to. |
verbose |
Controls how verbose messages should be printed during execution. Valid value range: 0–4. Higher numbers will give more messages for debugging. |
isnormalized |
Whether or not the input is already normalized. |
seed |
A seed for the random number generation, useful for replicating results. |
use_ram |
By default the results are written into files, using this option will load those files back to R and return them instead of their paths (the default behavior). When doing this there is a possibility that R will crash if you don't have enough RAM, use with caution on larger data sets or with many lambdas. |
BigQUIC is finally here! The original authors of QUIC and BigQUIC brought QUIC to Matlab (MEX), Standalone (C++), and R, but BigQUIC was delivered for Matlab and Standalone only with no R package. There are also some other features to the package, including sample data generation, inverse selection, and plotting. IMPORTANT: Due to the practicalities of formatting and working with large data sets, files are written to disk at various times when using BigQuic. The locations of the files BigQuic wrote to disk are kept in the object returned by BigQuic. They can be deleted when you're finished with the BigQuic_object manually by using the cleanFiles() function as shown in the examples. There are basiclly 8 cases for file creation, the following will give you an idea of where they are in case R crashes completely and loses the references to the files so you need to delete them manually. Files created in tmp are deleted on reboot, so no worries if you're having trouble finding them.
1. X, output file, use_ram = TRUE length(lambda) output files created in output location 1 file created for X in tmp Note: this is the same as 5, use_ram doesn't matter in this case
2. input file, no output file, use_ram = FALSE length(lambda) output files in location of input file
3. input file, output file, use_ram = FALSE length(lambda)a output files in location of output file Same as 8, use_ram doesn't matter in this case
4. X, no output file, use_ram = FALSE length(lambda) output files in tmp 1 file created for X in tmp Also same as 1 and 5
5. X, output file, use_ram = FALSE length(lambda) output files created in output file location 1 file created for X in tmp
6. X, no output file, use_ram = TRUE 1 file created for X in tmp
7. input file, no output file, use_ram = TRUE no files created
8. input file, output file, use_ram = TRUE length(lambda) output files created in output file location
An object with Reference Class "BigQuic_object"
X |
The X input for BigQuic, if given |
inputFileName |
The file name input for BigQuic, if given |
isnormalized |
Whether or not the input data was previously normalized |
k |
k used in BigQuic |
epsilon |
The epsilon that was used in this run of BigQuic |
lambda |
lambda used in BigQuic |
maxit |
maxit used in BigQuic |
memory_size |
memory_size used in BigQuic |
numthreads |
numthreads used in BigQuic |
seed |
seed used in BigQuic |
use_ram |
use_ram used in BigQuic |
verbose |
level of verbosity used in BigQuic |
opt.lambda |
The selected optimal lambda value, initially empty, it will be filled in by running BigQuic.select on the object, see the use in the Examples below |
precision_matrices |
The precision matrix for each of the lambdas in a list, so to access
the one for the 1st lambda in the example:
|
output_file_names |
Lists files created by the class |
clean |
Indicates whether or not cleanFiles() has been called on this object before |
inFlag |
An internal indicator for the class |
outFlag |
An internal indicator for the class |
getClass |
Returns Class method definition |
cleanFiles |
Deletes files created by the class, except for those intentionally output by specifying an output file name |
setX |
Used internall to set X |
setOptLambda |
used internally to set opt.lambda |
setSeed |
used internally to set the seed |
.self |
returns the object itself again |
.refClassDef |
Lists fields and methods of the reference class |
Khalid B. Kunji [aut, cre], Cho-Jui Hsieh [ctb], Matyas A. Sustik [ctb], Inderjit S. Dhillon [ctb], Pradeep Ravikumar [ctb], Tuo Zhao [ctb], Xingguo Li [ctb], Han Liu [ctb], Kathryn Roeder [ctb], John Lafferty [ctb], Larry Wasserman [ctb], George Karypis [ctb], Melissa O'Neill [ctb], Richard Henderson [ctb]
Maintainer: Khalid B. Kunji <[email protected]>
BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables (pdf) C. Hsieh, M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack. In Neural Information Processing Systems (NIPS), December 2013. (Oral) http://www.cs.utexas.edu/~cjhsieh/hugeQUIC.pdf http://bigdata.ices.utexas.edu/publication/big-quic-sparse-inverse-covariance-estimation-for-a-million-variables-2/
QUIC: Quadratic Approximation for Sparse Inverse Covariance Matrix Estimation (pdf) C. Hsieh, M. Sustik, I. Dhillon, P. Ravikumar. Journal of Machine Learning Research (JMLR), October 2014. http://jmlr.org/papers/volume15/hsieh14a/hsieh14a.pdf http://bigdata.ices.utexas.edu/publication/quic-quadratic-approximation-for-sparse-inverse-covariance-matrix-estimation-2/
METIS:"A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs". George Karypis and Vipin Kumar. SIAM Journal on Scientific Computing, Vol. 20, No. 1, pp. 359-392, 1999. http://glaros.dtc.umn.edu/gkhome/fetch/papers/mlSIAMSC99.pdf
PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation. This paper is currently submitted to ACM Transactions on Mathematical Software, where it is currently under review. http://www.pcg-random.org/pdf/toms-oneill-pcg-family-v1.02.pdf
lambda <- 0.91 exampleResult <- BigQuic(inputFileName = paste(path.package("BigQuic"), "/extdata/testInput", sep = ""), outputFileName = tempfile(pattern = "BigQuic_output_matrix", fileext = ".Bmat"), lambda = lambda, numthreads = 1, memory_size = 512, seed = 1, use_ram = TRUE) BigQuic.select(exampleResult) plot(exampleResult) exampleResult$cleanFiles() ## Not run: If you have the hdi package installed: library(hdi) data(riboflavin) lambda <- seq(from = 0.9, to = 0.99, by = 0.01) exampleResult <- BigQuic(as.matrix(riboflavin), lambda = lambda, numthreads = 1, memory_size = 512, seed = 1, use_ram = TRUE) BigQuic.select(exampleResult) plot(exampleResult) ## End(Not run)
lambda <- 0.91 exampleResult <- BigQuic(inputFileName = paste(path.package("BigQuic"), "/extdata/testInput", sep = ""), outputFileName = tempfile(pattern = "BigQuic_output_matrix", fileext = ".Bmat"), lambda = lambda, numthreads = 1, memory_size = 512, seed = 1, use_ram = TRUE) BigQuic.select(exampleResult) plot(exampleResult) exampleResult$cleanFiles() ## Not run: If you have the hdi package installed: library(hdi) data(riboflavin) lambda <- seq(from = 0.9, to = 0.99, by = 0.01) exampleResult <- BigQuic(as.matrix(riboflavin), lambda = lambda, numthreads = 1, memory_size = 512, seed = 1, use_ram = TRUE) BigQuic.select(exampleResult) plot(exampleResult) ## End(Not run)
Creates reference class objects (... which are really environments) of type BigQuic_object.
"BigQuic_object"
Reference Class that holds all the relevant results of the BigQuic computation.
All reference classes extend and inherit methods from "envRefClass"
.
precision_matrices
:Object of class list
~~
X
:Object of class matrix
~~
inputFileName
:Object of class character
~~
lambda
:Object of class numeric
~~
numthreads
:Object of class numeric
~~
maxit
:Object of class numeric
~~
epsilon
:Object of class numeric
~~
k
:Object of class numeric
~~
memory_size
:Object of class numeric
~~
verbose
:Object of class numeric
~~
isnormalized
:Object of class numeric
~~
seed
:Object of class numeric
~~
use_ram
:Object of class logical
~~
clean
:Object of class logical
~~
output_file_names
:Object of class character
~~
opt.lambda
:Object of class numeric
~~
inFlag
:Object of class logical
~~
outFlag
:Object of class logical
~~
cleanFiles(verbose)
:~~
setOptLambda(optLambda)
:~~
setX(inputX)
:~~
setSeed(inputSeed)
:~~
showClass("BigQuic_object")
showClass("BigQuic_object")
Selects the optimal lambda value from those in the BigQuic_object, i.e. BigQuic Result.
BigQuic.select(BigQuic_result = NULL, stars.thresh = 0.1, stars.subsample.ratio = NULL, rep.num = 20, verbose = TRUE, verbose2 = 0)
BigQuic.select(BigQuic_result = NULL, stars.thresh = 0.1, stars.subsample.ratio = NULL, rep.num = 20, verbose = TRUE, verbose2 = 0)
BigQuic_result |
A BigQuic_object returned from running BigQuic. |
stars.thresh |
The threshold used in the Stars selection method for choosing a lambda |
stars.subsample.ratio |
The ratio giving how large the subsamples will be for Stars, if null there is a heuristic calculation. |
rep.num |
Number of times to do the repetition in Stars. |
verbose |
Controls the level of verbosity in a part of the code. |
verbose2 |
Controls the level of verbosity in another section of code. |
Generates a sample data set for using with BigQuic, the default seed is 1 for reproducibility. For high dimensional data, choose p much larger than n.
generate_sample(n = 200, p = 150, seed = NULL)
generate_sample(n = 200, p = 150, seed = NULL)
n |
The number of rows in the resulting data set. |
p |
The number of columns in the resulting data set. |
seed |
A seed for the random number generator in R. |
Makes plot of the precision matrix showing non-zero values. The diagonal is shown in only black because the agreement with itself is not highly interesting. Negative relations are shown in green and positive in red. The saturation indicates the normalized strength of the relation. The matrix is symmetric and technically only the lower or upper triangle would suffice to provide identical information.
## S3 method for class 'BigQuic_object' plot(x, ...)
## S3 method for class 'BigQuic_object' plot(x, ...)
x |
The BigQuic object, which will have its optimal precision matrix plotted. |
... |
plot can take a variety of arguments depending on the type, that is represented by ... |