Skip to content

Should the gnostic texts found near Nag Hammadi be a part of the Bible? Use modern NLP techniques try to answer that controversial question.

Notifications You must be signed in to change notification settings

TraxData313/Bible-vs-NagHammadi-match-score

Repository files navigation

Bible vs Nag Hammadi match score

Should the gnostic texts found near Nag Hammadi be a part of the Bible? Use modern NLP techniques try to answer that controversial question.



Description and goal:

  • The Bible: The early Church arranged scripts, letters, ghospels and other texts related to the teachings of Jesus Christ, and put together the Bible
  • Gnostic texts: There were other texts that were rejected as herecy, claimed not to be Word of God. They are known as Gnostic texts. Most of those texts were lost or destroyed and only mentions of them or questionable copies survived
  • Nag Hammadi library: That changed in the year of 1945, when a collection of early Christian and authentic Gnostic texts were discovered near the Upper Egyptian town of Nag Hammadi
  • Zero hypothesis (H0): The Bible and the rejected Gnostic texts are part of the same teaching
  • The Goal: Use NLP to define the bounds that separate the Bible from other texts that are clearly not part of it (Control texts), then prove that H0 is wrong by observing whether the Gnostic texts fit inside or outside of those bounds



The dataset:

>>> import data_prep as dp
>>> df = dp.return_dataset()
>>> list(df.columns)
#OUTPUT: ['sentence', 'NUM', 'LIBRARY', 'AUTHOR', 'TEXT_NAME', 'TRANSLATION', 'char_count', 'words_count']
>>> df.sample(1).sentence
#OUTPUT: 17050    now the lord hath brought it , and done accord..

image

  • 001_dataset_preview.ipynb contains usage example of data_prep functions return_dataset() and print_dataset_stats() functions



H0 rejection by Word Frequency and Count comparison:

Calculations and detains in the Notebook: 002_word_freq_and_count_comparisons.ipynb

The metrics:

  • Word Count Diff: The author uses Long or Short sentences? Measure how much words an average sentence of the authors has, and look at word count difference between the two authors (for example, author A uses 60 words in a sencence on average, while author B uses only 15 - author B uses significantly shorter sencences than author A)
  • Word Freq Diff: What words the author uses? Get a list of every word used by both authors, rate the frequency of usage of every word by both authors and take the average of the differences between every word (for example, author A uses the word "love" frequently, while author B doesn't, but uses the word "pain" very much, while author A doesn't)

The experiment:

  • Rate the Bible authors: for every author in the Bible, compare his style with the style of the rest of the Bible. This way we will get the bounds of acceptable deviations
  • Rate the Control authors: now compare the styles of the two Control authors, which have nothing to do with Christianity, and compare them with the Bible. Make sure they are out of the acceptable deviations
  • Rate the Nag Hammadi texts authors: compare the Nag Hammadi authors and check whether they are within or out of the acceptable deviations

The results:

  • Plot: Once we perform the experiments and rate every author's Word Freq Diff and Word Count Diff, we can plot the results on the X and Y axis and observe that there is a visible separation
  • P-value by perm test: We can estimate the P-value of H0 related to each separate metric and prove that H0 is wrong. The observed P-value is below 0.0% for both metrics, proving that H0 is wrong

image

About

Should the gnostic texts found near Nag Hammadi be a part of the Bible? Use modern NLP techniques try to answer that controversial question.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published