Python, LDA : How to get the id of keywords instead of the keywords themselves with Gensim?

Amy21 · Jan 20, 2017

Hello,

I am applying the LDA method using Gensim to extract keywords from documents. I can extract topics, and then assign these topics and key words associated to the documents.

I would like to have the ids of these terms (or key words) instead of the terms themselves. I know that corpus extract a list of couples [(term_id, term_frequency) ...] of document i but I can't see how could I use this in my code to extract only the ids and assign it to my results.

My code is as follows :

Code:

ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=num_topics, id2word = dictionary, passes=passes, minimum_probability=0)

# Assinging the topics to the document in corpus
lda_corpus = ldamodel[corpus]

# Find the threshold, let's set the threshold to be 1/#clusters,
# To prove that the threshold is sane, we average the sum of all probabilities:
scores = list(chain(*[[score for topic_id,score in topic] \
                     for topic in [doc for doc in lda_corpus]]))

threshold = sum(scores)/len(scores)
print(threshold)

for t in range(len(topic_tuple)):

    key_words.append([topic_tuple[t][j][0] for j in range(num_words)])
    df_key_words = pd.DataFrame({'key_words' : key_words})

    documents_corpus.append([j for i,j in zip(lda_corpus,doc_set) if i[t][1] > threshold])
    df_documents_corpus = pd.DataFrame({'documents_corpus' : documents_corpus})

    documents_corpus_id.append([i for d,i in zip(lda_corpus, doc_set_id) if d[t][1] > threshold])
    df_documents_corpus_id = pd.DataFrame({'documents_corpus_id' : documents_corpus_id})


result.append(pd.concat([df_key_words, df_documents_corpus, df_documents_corpus_id ], axis=1))

Thank you in advance and ask me if more information are needed.

Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
How can I calculate the last payment for Reprofiled Amount column with 2 decimal places to make the sum of all payments to be the same as RC amount?	2	Jul 13, 2023
How to get the value of ID columns from older tables?	2	Feb 20, 2010
How to input values of the matrix from keyboard in python	3	Aug 11, 2010
how to get the NT event log properties with OnObjectReady() with python	1	Oct 11, 2007
how do I get the ID of a row I just added using OleDbDataAdapter.Update() ?	1	Oct 17, 2004
How to modify the CMWC4096 PRNG to make an LCG of it?	0	Jun 12, 2008
How to get the data of a table from the first to the last row with a JavaScript	4	Nov 2, 2005

Python, LDA : How to get the id of keywords instead of the keywords themselves with Gensim?

Amy21

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads