Bible vs Nag Hammadi match score

Should the gnostic texts found near Nag Hammadi be a part of the Bible? Use modern NLP techniques try to answer that controversial question.

Description and goal:

The Bible: The early Church arranged scripts, letters, ghospels and other texts related to the teachings of Jesus Christ, and put together the Bible
Gnostic texts: There were other texts that were rejected as herecy, claimed not to be Word of God. They are known as Gnostic texts. Most of those texts were lost or destroyed and only mentions of them or questionable copies survived
Nag Hammadi library: That changed in the year of 1945, when a collection of early Christian and authentic Gnostic texts were discovered near the Upper Egyptian town of Nag Hammadi
Zero hypothesis (H0): The Bible and the rejected Gnostic texts are part of the same teaching
The Goal: Use NLP to define the bounds that separate the Bible from other texts that are clearly not part of it (Control texts), then prove that H0 is wrong by observing whether the Gnostic texts fit inside or outside of those bounds

The dataset:

The raw texts are in data:
- The Bible (King James translation)
- Nag Hammadi gnostic texts
- Control texts
data_prep.py module contains the functions needed to parse and return a dataframe containing each sentence of the texts

>>> import data_prep as dp
>>> df = dp.return_dataset()
>>> list(df.columns)
#OUTPUT: ['sentence', 'NUM', 'LIBRARY', 'AUTHOR', 'TEXT_NAME', 'TRANSLATION', 'char_count', 'words_count']
>>> df.sample(1).sentence
#OUTPUT: 17050    now the lord hath brought it , and done accord..

001_dataset_preview.ipynb contains usage example of data_prep functions return_dataset() and print_dataset_stats() functions

H0 rejection by Word Frequency and Count comparison:

Calculations and detains in the Notebook: 002_word_freq_and_count_comparisons.ipynb

The metrics:

Word Count Diff: The author uses Long or Short sentences? Measure how much words an average sentence of the authors has, and look at word count difference between the two authors (for example, author A uses 60 words in a sencence on average, while author B uses only 15 - author B uses significantly shorter sencences than author A)
Word Freq Diff: What words the author uses? Get a list of every word used by both authors, rate the frequency of usage of every word by both authors and take the average of the differences between every word (for example, author A uses the word "love" frequently, while author B doesn't, but uses the word "pain" very much, while author A doesn't)

The experiment:

Rate the Bible authors: for every author in the Bible, compare his style with the style of the rest of the Bible. This way we will get the bounds of acceptable deviations
Rate the Control authors: now compare the styles of the two Control authors, which have nothing to do with Christianity, and compare them with the Bible. Make sure they are out of the acceptable deviations
Rate the Nag Hammadi texts authors: compare the Nag Hammadi authors and check whether they are within or out of the acceptable deviations

The results:

Plot: Once we perform the experiments and rate every author's Word Freq Diff and Word Count Diff, we can plot the results on the X and Y axis and observe that there is a visible separation
P-value by perm test: We can estimate the P-value of H0 related to each separate metric and prove that H0 is wrong. The observed P-value is below 0.0% for both metrics, proving that H0 is wrong

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
.gitignore		.gitignore
001_dataset_preview.ipynb		001_dataset_preview.ipynb
002_word_freq_and_count_comparisons.ipynb		002_word_freq_and_count_comparisons.ipynb
README.md		README.md
data_analysis.py		data_analysis.py
data_prep.py		data_prep.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bible vs Nag Hammadi match score

Description and goal:

The dataset:

H0 rejection by Word Frequency and Count comparison:

The metrics:

The experiment:

The results:

About

Releases

Packages

Languages

TraxData313/Bible-vs-NagHammadi-match-score

Folders and files

Latest commit

History

Repository files navigation

Bible vs Nag Hammadi match score

Description and goal:

The dataset:

H0 rejection by Word Frequency and Count comparison:

The metrics:

The experiment:

The results:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages