Latent Dirichlet Allocation

Latent Dirichlet Allocation: Latent Dirichlet allocation (LDA) ist ein von David Blei, Andrew Ng und Michael I. Jordan im Jahre 2002 vorgestelltes generatives Wahrscheinlichkeitsmodell für Dokumente wie Text- oder Bildkorpora. Dabei wird jedes Korpuselement (oft Dokument genannt) als eine Mischung von verschiedenen zugrundeliegenden Themen (eng. latent topics) betrachtet. Jedes sichtbare Wort im Dokument ist wiederum einem oder mehreren Themen zugeordnet. Diese Themen, deren Anzahl zu Beginn festgelegt wird, erklären Ähnlichkeiten zwischen Dokumenten. So wären mögliche Themen in Bildkorpora zum Beispiel Himmel, Wiese oder Straße; in Textkorpora abstraktere Inhalte, wie Sport, Politik oder Bildung.

LDA wird u.a. zur Dokumentmodellierung, Textklassifikation, Information-Retrieval, Collaborative Filtering oder dem Finden von neuen Inhalten in Textkorpora eingesetzt. Andere Anwendungen finden sich im Bereich der Bioinformatik.

Siehe auch

Dirichlet-Verteilung

Quellen

David M. Blei, Andrew Y. Ng, and Michael I. Jordan: Latent dirichlet allocation. Journal of Machine Learning Research. 3:993-1022. Mar. 2003.

LDA Implementierung in C von David Blei.

Kategorien:
Multivariate Statistik
Information Retrieval
Computerlinguistik

Игры ⚽ Поможем решить контрольную работу

Schlagen Sie auch in anderen Wörterbüchern nach:

Latent Dirichlet allocation — In statistics, latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups which explain why some parts of the data are similar. For example, if observations are words collected… … Wikipedia
Latent Dirichlet Allocation — En este artículo sobre matemáticas se detectaron los siguientes problemas: Necesita ser wikificado conforme a las convenciones de estilo de Wikipedia. Parece ser una traducción defectuosa. Por favor … Wikipedia Español
Latent semantic analysis — (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA was … Wikipedia
Dirichlet distribution — Several images of the probability density of the Dirichlet distribution when K=3 for various parameter vectors α. Clockwise from top left: α=(6, 2, 2), (3, 7, 5), (6, 2, 6), (2, 3, 4). In probability and… … Wikipedia
Probabilistic latent semantic analysis — (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two mode and co occurrence data. PLSA evolved from Latent semantic analysis, adding a… … Wikipedia
Bag of words model in computer vision — This is an article introducing the Bag of words model (BoW) in computer vision, especially for object categorization. From now, the BoW model refers to the BoW model in computer vision unless explicitly declared.Before introducing the BoW model,… … Wikipedia
Analyse Sémantique Latente Probabiliste — L’analyse sémantique latente probabiliste ou PLSA (de l anglais : Probabilistic latent semantic analysis) aussi appelée indexation sémantique latente probabiliste ou PLSI, est une méthode de traitement automatique des langues inspirée de l… … Wikipédia en Français
Analyse semantique latente probabiliste — Analyse sémantique latente probabiliste L’analyse sémantique latente probabiliste ou PLSA (de l anglais : Probabilistic latent semantic analysis) aussi appelée indexation sémantique latente probabiliste ou PLSI, est une méthode de traitement … Wikipédia en Français
Analyse sémantique latente probabiliste — L’analyse sémantique latente probabiliste (de l anglais, Probabilistic latent semantic analysis : PLSA), aussi appelée indexation sémantique latente probabiliste (PLSI), est une méthode de traitement automatique des langues inspirée de l… … Wikipédia en Français
PLSA — Analyse sémantique latente probabiliste L’analyse sémantique latente probabiliste ou PLSA (de l anglais : Probabilistic latent semantic analysis) aussi appelée indexation sémantique latente probabiliste ou PLSI, est une méthode de traitement … Wikipédia en Français

Academic dictionaries and encyclopedias

Latent Dirichlet Allocation

Siehe auch

Quellen

Schlagen Sie auch in anderen Wörterbüchern nach:

Share the article and excerpts

Academic dictionaries and encyclopedias

Deutsch Wikipedia

Latent Dirichlet Allocation

Siehe auch

Quellen

Schlagen Sie auch in anderen Wörterbüchern nach:

Share the article and excerpts

Direct link