Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test the topic changing over time with CSV format #2527

Closed
maplejia opened this issue Jun 14, 2019 · 3 comments
Closed

test the topic changing over time with CSV format #2527

maplejia opened this issue Jun 14, 2019 · 3 comments
Labels
need info Not enough information for reproduce an issue, need more info from author

Comments

@maplejia
Copy link

I am trying to implement gensim.models.wrappers import DtmModel to test the topic changing over time.

My testing file is Amazon review file with CSV format, which include reviews, ratings, date and title.

I am trying to replicate the page https://radimrehurek.com/gensim/models/wrappers/dtmmodel.html to find the topic changing over time, but fail to create time_slices. Is any one can help me? thank you very much.

Problem description

What are you trying to achieve? What is the expected result? What are you seeing instead?

Steps/code/corpus to reproduce

Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal ("minimal reproducible example").

Versions

Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
@maplejia
Copy link
Author

Problem description

I am trying to replicate the page https://radimrehurek.com/gensim/models/wrappers/dtmmodel.html to find the topic changing over time, but fail to create time_slices.
I am using a amazon review csv file, which include review content, date, ID and title each line.

Steps/code/corpus to reproduce

  1. import nltk
    from nltk import FreqDist
    nltk.download('stopwords') # run this one time

output:
[nltk_data] Downloading package stopwords to
[nltk_data] C:\Users\qsu2\AppData\Roaming\nltk_data...
[nltk_data] Package stopwords is already up-to-date!
True

  1. import pandas as pd
    pd.set_option("display.max_colwidth", 200)
    import numpy as np
    import re
    import spacy

import gensim
from gensim import corpora

libraries for visualization

import pyLDAvis
import pyLDAvis.gensim
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

output: 2019-06-14 06:14:01,351 : DEBUG : backend module://ipykernel.pylab.backend_inline version unknown

  1. import pandas as pd
    df = pd.read_csv("Amazon_review.csv", encoding="ISO-8859–1")

  2. df['Content'] = df['Content'].str.replace("[^a-zA-Z#]", " ")# remove unwanted characters, numbers and symbols
    5.from nltk.corpus import stopwords
    stop_words = stopwords.words('english')

  3. function to remove stopwords

def remove_stopwords(rev):
rev_new = " ".join([i for i in rev if i not in stop_words])
return rev_new

remove short words (length < 3)

df['Content'] = df['Content'].apply(lambda x: ' '.join([w for w in x.split() if len(w)>2]))

remove stopwords from the text

reviews = [remove_stopwords(r.split()) for r in df['Content']]

make entire text lowercase

reviews = [r.lower() for r in reviews]

Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal ("minimal reproducible example").
7. import spacy
!python -m spacy download en
output:
Requirement already satisfied: en_core_web_sm==2.1.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz#egg=en_core_web_sm==2.1.0 in c:\users\qsu2\appdata\local\continuum\anaconda3\lib\site-packages (2.1.0)
[+] Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
symbolic link created for C:\Users\qsu2\AppData\Local\Continuum\anaconda3\lib\site-packages\spacy\data\en <<===>> C:\Users\qsu2\AppData\Local\Continuum\anaconda3\lib\site-packages\en_core_web_sm
[+] Linking successful
C:\Users\qsu2\AppData\Local\Continuum\anaconda3\lib\site-packages\en_core_web_sm
-->
C:\Users\qsu2\AppData\Local\Continuum\anaconda3\lib\site-packages\spacy\data\en
You can now load the model via spacy.load('en')

  1. nlp = spacy.load('en', disable=['parser', 'ner'])

def lemmatization(texts, tags=['NOUN', 'ADJ']): # filter noun and adjective
output = []
for sent in texts:
doc = nlp(" ".join(sent))
output.append([token.lemma_ for token in doc if token.pos_ in tags])
return output
9. tokenized_reviews = pd.Series(reviews).apply(lambda x: x.split())
print(tokenized_reviews[1])

output:

['bought', 'samsung', 'reviewed', 'happy', 'new', 'version', 'released', 'resist', 'buying', 'married', 'geek', 'never', 'many', 'gadgets', 'around', 'house', 'here', 'love', 'laptop', 'incredibly', 'thin', 'light', 'however', 'use', 'feels', 'big', 'keyboard', 'full', 'size', 'one', 'keyboard', 'great', 'better', 'trackpad', 'huge', 'laptop', 'size', 'improvement', 'previous', 'version', 'trackpad', 'surface', 'smooth', 'almost', 'feels', 'like', 'glass', 'makes', 'scrolling', 'breeze', 'overall', 'hardware', 'great', 'looks', 'really', 'good', 'especially', 'price', 'tag', 'expecting', 'surprises', 'software', 'since', 'old', 'latest', 'features', 'thanks', 'automatic', 'updates', 'however', 'pleasantly', 'surprised', 'speed', 'new', 'boots', 'quickly', 'pages', 'load', 'really', 'fast', 'great', 'use', 'there', 'also', 'new', 'add', 'ons', 'improvement', 'well', 'apps', 'create', 'google', 'doc', 'slide', 'spreadsheet', 'one', 'click', 'well', 'new', 'camera', 'app', 'lot', 'fun', 'hope', 'google', 'releases', 'old', 'all', 'happy', 'laptop', 'highly', 'recommend', 'great', 'features', 'usability', 'amazing', 'price', 'great', 'christmas', 'gift', 'idea']

  1. reviews_2 = lemmatization(tokenized_reviews)
    print(reviews_2[1]) # print lemmatized review

output:

['samsung', 'happy', 'new', 'version', 'resist', 'married', 'geek', 'many', 'gadget', 'house', 'laptop', 'thin', 'light', 'big', 'keyboard', 'full', 'size', 'keyboard', 'well', 'trackpad', 'huge', 'laptop', 'size', 'improvement', 'previous', 'version', 'trackpad', 'surface', 'smooth', 'glass', 'overall', 'hardware', 'great', 'good', 'price', 'tag', 'surprise', 'software', 'old', 'late', 'feature', 'thank', 'automatic', 'update', 'surprised', 'speed', 'new', 'boot', 'load', 'great', 'use', 'new', 'on', 'improvement', 'well', 'app', 'doc', 'slide', 'spreadsheet', 'click', 'new', 'camera', 'app', 'lot', 'fun', 'hope', 'release', 'old', 'happy', 'laptop', 'great', 'feature', 'usability', 'amazing', 'price', 'great', 'christmas', 'gift', 'idea']

  1. dictionary = corpora.Dictionary(reviews_2)

output: 2019-06-14 06:15:45,260 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2019-06-14 06:15:45,342 : INFO : built Dictionary(4778 unique tokens: ['amazed', 'app', 'background', 'bag', 'bookmark']...) from 1291 documents (total 60491 corpus positions)

  1. doc_term_matrix = [dictionary.doc2bow(rev) for rev in reviews_2]

13.LDA = gensim.models.ldamodel.LdaModel
lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=10, random_state=100,
chunksize=1000, passes=50)
output:

corpus=doc_term_matrix

Creating the object for LDA model using gensim library

LDA = gensim.models.ldamodel.LdaModel

Build LDA model

lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=7, random_state=100,

lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=10, random_state=100,
chunksize=1000, passes=50)
2019-06-14 07:33:26,336 : INFO : using symmetric alpha at 0.1
2019-06-14 07:33:26,341 : INFO : using symmetric eta at 0.1
2019-06-14 07:33:26,367 : INFO : using serial LDA version on this node
2019-06-14 07:33:26,623 : INFO : running online (multi-pass) LDA training, 10 topics, 50 passes over the supplied corpus of 1291 documents, updating model once every 1000 documents, evaluating perplexity every 1291 documents, iterating 50x with a convergence threshold of 0.001000
2019-06-14 07:33:26,628 : INFO : PROGRESS: pass 0, at document #1000/1291
2019-06-14 07:33:26,631 : DEBUG : performing inference on a chunk of 1000 documents
2019-06-14 07:33:28,961 : DEBUG : 413/1000 documents converged within 50 iterations
2019-06-14 07:33:28,975 : DEBUG : updating topics
2019-06-14 07:33:28,997 : INFO : merging changes from 1000 documents into a model of 1291 documents
2019-06-14 07:33:29,052 : INFO : topic #6 (0.100): 0.013*"good" + 0.012*"chrome" + 0.012*"laptop" + 0.010*"computer" + 0.010*"thing" + 0.010*"price" + 0.009*"great" + 0.008*"love" + 0.008*"chromebook" + 0.008*"book"
2019-06-14 07:33:29,055 : INFO : topic #1 (0.100): 0.017*"chromebook" + 0.015*"use" + 0.014*"laptop" + 0.013*"samsung" + 0.010*"device" + 0.009*"keyboard" + 0.009*"battery" + 0.009*"work" + 0.008*"chrome" + 0.008*"computer"
2019-06-14 07:33:29,056 : INFO : topic #3 (0.100): 0.028*"computer" + 0.025*"chromebook" + 0.016*"laptop" + 0.015*"great" + 0.012*"good" + 0.011*"use" + 0.011*"work" + 0.010*"chrome" + 0.009*"keyboard" + 0.009*"battery"
2019-06-14 07:33:29,058 : INFO : topic #0 (0.100): 0.027*"chromebook" + 0.017*"time" + 0.016*"computer" + 0.010*"laptop" + 0.008*"use" + 0.008*"great" + 0.008*"device" + 0.008*"machine" + 0.008*"little" + 0.007*"day"
2019-06-14 07:33:29,059 : INFO : topic #8 (0.100): 0.022*"chromebook" + 0.018*"use" + 0.014*"great" + 0.014*"laptop" + 0.012*"good" + 0.012*"price" + 0.010*"keyboard" + 0.010*"computer" + 0.008*"web" + 0.007*"screen"
2019-06-14 07:33:29,061 : INFO : topic diff=5.212306, rho=1.000000
2019-06-14 07:33:29,065 : DEBUG : bound: at document #0
2019-06-14 07:33:29,933 : INFO : -8.262 per-word bound, 307.1 perplexity estimate based on a held-out corpus of 291 documents with 9145 words
2019-06-14 07:33:29,935 : INFO : PROGRESS: pass 0, at document #1291/1291
2019-06-14 07:33:29,936 : DEBUG : performing inference on a chunk of 291 documents
2019-06-14 07:33:30,525 : DEBUG : 167/291 documents converged within 50 iterations
2019-06-14 07:33:30,527 : DEBUG : updating topics
2019-06-14 07:33:30,531 : INFO : merging changes from 291 documents into a model of 1291 documents
2019-06-14 07:33:30,536 : INFO : topic #7 (0.100): 0.018*"laptop" + 0.018*"thing" + 0.016*"great" + 0.015*"computer" + 0.013*"chromebook" + 0.011*"problem" + 0.011*"chrome" + 0.010*"time" + 0.008*"device" + 0.008*"use"
2019-06-14 07:33:30,538 : INFO : topic #4 (0.100): 0.023*"easy" + 0.015*"chromebook" + 0.014*"screen" + 0.013*"laptop" + 0.013*"computer" + 0.013*"use" + 0.010*"great" + 0.009*"size" + 0.007*"daughter" + 0.007*"work"
2019-06-14 07:33:30,540 : INFO : topic #3 (0.100): 0.029*"computer" + 0.023*"chromebook" + 0.019*"laptop" + 0.016*"great" + 0.013*"work" + 0.012*"good" + 0.012*"internet" + 0.011*"use" + 0.011*"chrome" + 0.010*"easy"
2019-06-14 07:33:30,541 : INFO : topic #8 (0.100): 0.023*"chromebook" + 0.016*"laptop" + 0.015*"use" + 0.014*"great" + 0.013*"printer" + 0.013*"good" + 0.011*"price" + 0.010*"computer" + 0.009*"keyboard" + 0.007*"video"
2019-06-14 07:33:30,542 : INFO : topic #9 (0.100): 0.020*"laptop" + 0.019*"use" + 0.018*"chromebook" + 0.016*"great" + 0.016*"computer" + 0.012*"light" + 0.012*"love" + 0.010*"work" + 0.010*"key" + 0.010*"easy"
2019-06-14 07:33:30,544 : INFO : topic diff=1.408125, rho=0.707107
2019-06-14 07:33:30,545 : INFO : PROGRESS: pass 1, at document #1000/1291
2019-06-14 07:33:30,546 : DEBUG : performing inference on a chunk of 1000 documents
2019-06-14 07:33:32,312 : DEBUG : 694/1000 documents converged within 50 iterations
2019-06-14 07:33:32,314 : DEBUG : updating topics
2019-06-14 07:33:32,318 : INFO : merging changes from 1000 documents into a model of 1291 documents
2019-06-14 07:33:32,322 : INFO : topic #0 (0.100): 0.028*"chromebook" + 0.020*"time" + 0.012*"computer" + 0.009*"device" + 0.009*"machine" + 0.009*"laptop" + 0.008*"little" + 0.008*"new" + 0.008*"screen" + 0.007*"great"
2019-06-14 07:33:32,324 : INFO : topic #3 (0.100): 0.031*"computer" + 0.024*"chromebook" + 0.019*"laptop" + 0.015*"great" + 0.012*"use" + 0.012*"good" + 0.011*"work" + 0.011*"chrome" + 0.011*"internet" + 0.010*"thing"
2019-06-14 07:33:32,326 : INFO : topic #2 (0.100): 0.034*"chromebook" + 0.015*"computer" + 0.014*"screen" + 0.013*"laptop" + 0.008*"app" + 0.008*"web" + 0.008*"use" + 0.008*"samsung" + 0.007*"time" + 0.007*"little"
2019-06-14 07:33:32,327 : INFO : topic #4 (0.100): 0.019*"easy" + 0.015*"chromebook" + 0.015*"screen" + 0.012*"laptop" + 0.011*"size" + 0.011*"use" + 0.011*"computer" + 0.009*"great" + 0.009*"light" + 0.009*"month"
2019-06-14 07:33:32,329 : INFO : topic #7 (0.100): 0.018*"thing" + 0.017*"laptop" + 0.014*"computer" + 0.014*"great" + 0.012*"chromebook" + 0.012*"chrome" + 0.011*"problem" + 0.010*"device" + 0.010*"time" + 0.010*"issue"
2019-06-14 07:33:32,330 : INFO : topic diff=0.850735, rho=0.551234
2019-06-14 07:33:32,334 : DEBUG : bound: at document #0
2019-06-14 07:33:32,947 : INFO : -7.668 per-word bound, 203.4 perplexity estimate based on a held-out corpus of 291 documents with 9145 words
2019-06-14 07:33:32,948 : INFO : PROGRESS: pass 1, at document #1291/1291
2019-06-14 07:33:32,950 : DEBUG : performing inference on a chunk of 291 documents
2019-06-14 07:33:33,315 : DEBUG : 262/291 documents converged within 50 iterations
2019-06-14 07:33:33,317 : DEBUG : updating topics
2019-06-14 07:33:33,321 : INFO : merging changes from 291 documents into a model of 1291 documents
2019-06-14 07:33:33,325 : INFO : topic #8 (0.100): 0.023*"chromebook" + 0.018*"printer" + 0.016*"laptop" + 0.015*"great" + 0.014*"use" + 0.014*"good" + 0.012*"price" + 0.009*"computer" + 0.009*"keyboard" + 0.009*"video"
2019-06-14 07:33:33,327 : INFO : topic #1 (0.100): 0.024*"chromebook" + 0.024*"samsung" + 0.014*"use" + 0.012*"product" + 0.011*"laptop" + 0.011*"battery" + 0.010*"acer" + 0.010*"work" + 0.009*"screen" + 0.009*"keyboard"
2019-06-14 07:33:33,329 : INFO : topic #0 (0.100): 0.028*"chromebook" + 0.021*"time" + 0.009*"new" + 0.009*"laptop" + 0.009*"computer" + 0.009*"machine" + 0.009*"device" + 0.008*"great" + 0.008*"little" + 0.008*"window"
2019-06-14 07:33:33,331 : INFO : topic #3 (0.100): 0.031*"computer" + 0.022*"chromebook" + 0.020*"laptop" + 0.015*"great" + 0.012*"work" + 0.012*"use" + 0.012*"internet" + 0.011*"good" + 0.011*"chrome" + 0.011*"thing"
2019-06-14 07:33:33,332 : INFO : topic #7 (0.100): 0.019*"thing" + 0.017*"laptop" + 0.016*"great" + 0.013*"power" + 0.012*"problem" + 0.012*"computer" + 0.012*"chrome" + 0.010*"chromebook" + 0.009*"time" + 0.008*"issue"
2019-06-14 07:33:33,333 : INFO : topic diff=0.732475, rho=0.551234
2019-06-14 07:33:33,335 : INFO : PROGRESS: pass 2, at document #1000/1291
2019-06-14 07:33:33,336 : DEBUG : performing inference on a chunk of 1000 documents
2019-06-14 07:33:34,974 : DEBUG : 803/1000 documents converged within 50 iterations
2019-06-14 07:33:34,976 : DEBUG : updating topics
2019-06-14 07:33:34,980 : INFO : merging changes from 1000 documents into a model of 1291 documents
2019-06-14 07:33:34,984 : INFO : topic #4 (0.100): 0.023*"easy" + 0.015*"screen" + 0.014*"size" + 0.012*"repair" + 0.010*"daughter" + 0.010*"month" + 0.010*"laptop" + 0.010*"use" + 0.010*"chromebook" + 0.009*"great"
2019-06-14 07:33:34,986 : INFO : topic #9 (0.100): 0.023*"use" + 0.022*"laptop" + 0.018*"great" + 0.015*"computer" + 0.015*"chromebook" + 0.015*"key" + 0.015*"light" + 0.014*"love" + 0.013*"easy" + 0.012*"keyboard"
2019-06-14 07:33:34,987 : INFO : topic #2 (0.100): 0.035*"chromebook" + 0.016*"screen" + 0.012*"computer" + 0.012*"laptop" + 0.008*"samsung" + 0.008*"problem" + 0.008*"app" + 0.008*"little" + 0.008*"machine" + 0.007*"video"
2019-06-14 07:33:34,989 : INFO : topic #0 (0.100): 0.027*"chromebook" + 0.021*"time" + 0.010*"machine" + 0.010*"new" + 0.009*"device" + 0.009*"computer" + 0.008*"little" + 0.008*"window" + 0.008*"laptop" + 0.008*"system"
2019-06-14 07:33:34,991 : INFO : topic #1 (0.100): 0.024*"chromebook" + 0.021*"samsung" + 0.013*"use" + 0.011*"device" + 0.011*"battery" + 0.010*"chrome" + 0.010*"product" + 0.010*"laptop" + 0.010*"keyboard" + 0.010*"screen"
2019-06-14 07:33:34,992 : INFO : topic diff=0.545615, rho=0.482748
2019-06-14 07:33:34,996 : DEBUG : bound: at document #0
2019-06-14 07:33:35,532 : INFO : -7.344 per-word bound, 162.5 perplexity estimate based on a held-out corpus of 291 documents with 9145 words
2019-06-14 07:33:35,534 : INFO : PROGRESS: pass 2, at document #1291/1291
2019-06-14 07:33:35,535 : DEBUG : performing inference on a chunk of 291 documents
2019-06-14 07:33:35,849 : DEBUG : 279/291 documents converged within 50 iterations

  1. lda_model.print_topics()

output:
[(0,
'0.024*"chromebook" + 0.021*"time" + 0.013*"new" + 0.011*"machine" + 0.011*"window" + 0.010*"system" + 0.010*"processor" + 0.010*"month" + 0.009*"review" + 0.008*"thing"'),
(1,
'0.033*"chromebook" + 0.025*"samsung" + 0.017*"screen" + 0.014*"device" + 0.013*"keyboard" + 0.012*"use" + 0.012*"battery" + 0.011*"chrome" + 0.010*"product" + 0.010*"acer"'),
(2,
'0.034*"chromebook" + 0.023*"screen" + 0.017*"computer" + 0.010*"little" + 0.010*"samsung" + 0.009*"case" + 0.009*"problem" + 0.008*"new" + 0.008*"shell" + 0.007*"lot"'),
(3,
'0.029*"computer" + 0.022*"chromebook" + 0.021*"laptop" + 0.014*"great" + 0.013*"use" + 0.012*"thing" + 0.012*"web" + 0.011*"internet" + 0.011*"app" + 0.011*"chrome"'),
(4,
'0.056*"repair" + 0.018*"screen" + 0.018*"daughter" + 0.017*"warranty" + 0.014*"customer" + 0.009*"company" + 0.009*"month" + 0.008*"money" + 0.008*"easy" + 0.008*"service"'),
(5,
'0.021*"time" + 0.015*"file" + 0.014*"work" + 0.013*"chromebook" + 0.011*"laptop" + 0.009*"note" + 0.009*"drive" + 0.009*"great" + 0.007*"everything" + 0.007*"student"'),
(6,
'0.015*"system" + 0.015*"real" + 0.012*"verizon" + 0.007*"glitch" + 0.007*"operating" + 0.006*"notebook" + 0.005*"datum" + 0.005*"part" + 0.005*"net" + 0.005*"straight"'),
(7,
'0.022*"power" + 0.020*"problem" + 0.019*"network" + 0.015*"supply" + 0.013*"laptop" + 0.013*"month" + 0.012*"thing" + 0.012*"issue" + 0.011*"support" + 0.011*"chrome"'),
(8,
'0.029*"printer" + 0.027*"chromebook" + 0.018*"video" + 0.014*"cloud" + 0.013*"amazon" + 0.012*"good" + 0.012*"skype" + 0.012*"use" + 0.010*"laptop" + 0.010*"work"'),
(9,
'0.035*"great" + 0.030*"laptop" + 0.030*"easy" + 0.027*"light" + 0.025*"use" + 0.020*"love" + 0.016*"good" + 0.015*"screen" + 0.015*"key" + 0.014*"small"')]


15.from gensim.test.utils import common_corpus, common_dictionary
from gensim.models.wrappers import DtmModel
path_to_dtm_binary = "C:/Users/qsu2/DTM/dtm-win64.exe"

16 . model = DtmModel(
path_to_dtm_binary, corpus=doc_term_matrix, id2word=dictionary,
time_slices= [1] * len(doc_term_matrix)
)

output:
OSError Traceback (most recent call last)
in ()
1 model = DtmModel(
2 path_to_dtm_binary, corpus=doc_term_matrix, id2word=dictionary,
----> 3 time_slices= [1] * len(doc_term_matrix)
4 )

~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\wrappers\dtmmodel.py in init(self, dtm_path, corpus, time_slices, mode, model, num_topics, id2word, prefix, lda_sequence_min_iter, lda_sequence_max_iter, lda_max_em_iter, alpha, top_chain_var, rng_seed, initialize_lda)
162
163 if corpus is not None:
--> 164 self.train(corpus, time_slices, mode, model)
165
166 def fout_liklihoods(self):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\gensim\models\wrappers\dtmmodel.py in train(self, corpus, time_slices, mode, model)
367 check_output(args=cmd, stderr=PIPE)
368
--> 369 self.em_steps = np.loadtxt(self.fem_steps())
370 self.init_ss = np.loadtxt(self.flda_ss())
371

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows)
960 fname = os_fspath(fname)
961 if _is_string_like(fname):
--> 962 fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
963 fencoding = getattr(fh, 'encoding', 'latin1')
964 fh = iter(fh)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib_datasource.py in open(path, mode, destpath, encoding, newline)
264
265 ds = DataSource(destpath)
--> 266 return ds.open(path, mode, encoding=encoding, newline=newline)
267
268

~\AppData\Local\Continuum\anaconda3\lib\site-packages\numpy\lib_datasource.py in open(self, path, mode, encoding, newline)
622 encoding=encoding, newline=newline)
623 else:
--> 624 raise IOError("%s not found." % path)
625
626

OSError: C:\Users\qsu2\AppData\Local\Temp\f909e7_train_out/em_log.dat not found.

if change 16 as:
model = DtmModel(
path_to_dtm_binary, corpus=doc_term_matrix, id2word=dictionary,
time_slices= [300, 300, 300, 391] ### total have 1291 reviews
)
topics = model.show_topic(topicid=1, time=1, num_words=10)
topics

output:
[(0.4989328070379685, 'hour'),
(0.36703322592000454, 'want'),
(2.806406345101065e-05, 'sharper'),
(2.806406345101065e-05, 'tekkie'),
(2.806406345101065e-05, 'suitcase'),
(2.806406345101065e-05, 'stunning'),
(2.806406345101065e-05, 'sticking'),
(2.806406345101065e-05, 'staying'),
(2.806406345101065e-05, 'stapler'),
(2.806406345101065e-05, 'thingy')]

it seems working after take long time.

But need help on how to see topic change over time(like day, month and year)
thanks.

Versions
python 3.6.6 Win10

Out
Please provide the output of:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.version)
import scipy; print("SciPy", scipy.version)
import gensim; print("gensim", gensim.version)
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 21, 2019

Please edit your comment and fix the markdown formatting. It's a bit difficult to see what you're trying to do because the formatting is so messed up.

@mpenkov mpenkov added the need info Not enough information for reproduce an issue, need more info from author label Jun 21, 2019
@mpenkov
Copy link
Collaborator

mpenkov commented Sep 28, 2019

Closing due to inactivity.

@mpenkov mpenkov closed this as completed Sep 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need info Not enough information for reproduce an issue, need more info from author
Projects
None yet
Development

No branches or pull requests

2 participants