Title: | Tuning of the Latent Dirichlet Allocation Models Parameters |
---|---|
Description: | This library estimates the best fitting number of topics. |
Authors: | Murzintcev Nikita [aut] |
Maintainer: | Nathan Chaney <[email protected]> |
License: | BSD_2_clause + file LICENSE |
Version: | 1.0.3 |
Built: | 2025-02-25 03:00:21 UTC |
Source: | https://github.com/nikita-moor/ldatuning |
Implement scoring algorithm
Arun2010(models, dtm)
Arun2010(models, dtm)
models |
An object of class "LDA |
dtm |
An object of class "DocumentTermMatrix" with term-frequency weighting or an object coercible to a "simple_triplet_matrix" with integer entries. |
A scalar LDA model score
Implement scoring algorithm
CaoJuan2009(models)
CaoJuan2009(models)
models |
An object of class "LDA |
A scalar LDA model score
Implement scoring algorithm
Deveaud2014(models)
Deveaud2014(models)
models |
An object of class "LDA |
A scalar LDA model score
Calculates different metrics to estimate the most preferable number of topics for LDA model.
FindTopicsNumber( dtm, topics = seq(10, 40, by = 10), metrics = "Griffiths2004", method = "Gibbs", control = list(), mc.cores = NA, return_models = FALSE, verbose = FALSE, libpath = NULL )
FindTopicsNumber( dtm, topics = seq(10, 40, by = 10), metrics = "Griffiths2004", method = "Gibbs", control = list(), mc.cores = NA, return_models = FALSE, verbose = FALSE, libpath = NULL )
dtm |
An object of class "DocumentTermMatrix" with term-frequency weighting or an object coercible to a "simple_triplet_matrix" with integer entries. |
topics |
Vector with number of topics to compare different models. |
metrics |
String or vector of possible metrics: "Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014". |
method |
The method to be used for fitting; see LDA. |
control |
A named list of the control parameters for estimation or an object of class "LDAcontrol". |
mc.cores |
NA, integer or, cluster; the number of CPU cores to process models simultaneously. If an integer, create a cluster on the local machine. If a cluster, use but don't destroy it (allows multiple-node clusters). Defaults to NA, which triggers auto-detection of number of cores on the local machine. |
return_models |
Whether or not to return the model objects of class "LDA. Defaults to false. Setting to true requires the tibble package. |
verbose |
If false (default), suppress all warnings and additional information. |
libpath |
Path to R packages (use only if your R installation can't find 'topicmodels' package, [issue #3](https://github.com/nikita-moor/ldatuning/issues/3). For example: "C:/Program Files/R/R-2.15.2/library" (Windows), "/home/user/R/x86_64-pc-linux-gnu-library/3.2" (Linux) |
Data-frame with one or more metrics. numbers of topics and
corresponding values of metric. Can be directly used by
FindTopicsNumber_plot
to draw a plot.
## Not run: library(topicmodels) data("AssociatedPress", package="topicmodels") dtm <- AssociatedPress[1:10, ] FindTopicsNumber(dtm, topics = 2:10, metrics = "Arun2010", mc.cores = 1L) ## End(Not run)
## Not run: library(topicmodels) data("AssociatedPress", package="topicmodels") dtm <- AssociatedPress[1:10, ] FindTopicsNumber(dtm, topics = 2:10, metrics = "Arun2010", mc.cores = 1L) ## End(Not run)
Support function to analyze optimal topic number. Use output of the
FindTopicsNumber
function.
FindTopicsNumber_plot(values)
FindTopicsNumber_plot(values)
values |
Data-frame with first column named 'topics' and other columns are values of metrics. |
## Not run: library(topicmodels) data("AssociatedPress", package="topicmodels") dtm <- AssociatedPress[1:10, ] optimal.topics <- FindTopicsNumber(dtm, topics = 2:10, metrics = c("Arun2010", "CaoJuan2009", "Griffiths2004") ) FindTopicsNumber_plot(optimal.topics) ## End(Not run)
## Not run: library(topicmodels) data("AssociatedPress", package="topicmodels") dtm <- AssociatedPress[1:10, ] optimal.topics <- FindTopicsNumber(dtm, topics = 2:10, metrics = c("Arun2010", "CaoJuan2009", "Griffiths2004") ) FindTopicsNumber_plot(optimal.topics) ## End(Not run)
Implement scoring algorithm. In order to use this algorithm, the LDA model MUST be generated using the keep control parameter >0 (defaults to 50) so that the logLiks vector is retained.
Griffiths2004(models, control)
Griffiths2004(models, control)
models |
An object of class "LDA |
control |
A named list of the control parameters for estimation or an object of class "LDAcontrol". |
A scalar LDA model score
A package for identifying the number of topics in a text corpus by generating LDA models, tuning LDA model parameters, and scoring model results.
Maintainer: Nathan Chaney [email protected] (ORCID) [contributor]
Authors:
Murzintcev Nikita [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/nikita-moor/ldatuning/issues