Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 13: Naive Bayes #82

Open
m4ur0jr opened this issue Jan 4, 2020 · 0 comments
Open

Chapter 13: Naive Bayes #82

m4ur0jr opened this issue Jan 4, 2020 · 0 comments

Comments

@m4ur0jr
Copy link

m4ur0jr commented Jan 4, 2020

I can't get the counts value.

image

`#TESTANDO O MODELO
def get_subject_data(path):

path = r"C:\Users\mauro\Google Drive\Análise de Dados\Python\Livro_DS_Zero\Spam\20021010_spam"

data = []

# regex for stripping out the leading "Subject:" and any spaces after it
subject_regex = re.compile(r"^Subject:\s+")

# glob.glob returns every filename that matches the wildcarded path
for fn in glob.glob(path):
    is_spam = "ham" not in fn
    
    with open(fn,'r') as file:
        for line in file:
            if line.startswith("Subject:"):
                subject = subject_regex.sub("", line).strip()
                data.append((subject, is_spam))

return data

def p_spam_given_word(word_prob):
word, prob_if_spam, prob_if_not_spam = word_prob
return prob_if_spam / (prob_if_spam + prob_if_not_spam)

def train_and_test_model(path):

data = get_subject_data(path)
random.seed(0)      # just so you get the same answers as me
train_data, test_data = split_data(data, 0.75)    

classifier = NaiveBayesClassifier()
classifier.train(train_data)

classified = [(subject, is_spam, classifier.classify(subject))
          for subject, is_spam in test_data]

counts = Counter((is_spam, spam_probability > 0.5) # (actual, predicted)
                 for _, is_spam, spam_probability in classified)

print(counts)

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant