Skip to content

Returns a word's distance from other words within sentences or phrases from given text

License

Notifications You must be signed in to change notification settings

jasonmhead/Word-Context-Distance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word-Context-Distance


This is exploratory learning code, for more complex application you might want to check out something like


get_word_distances_per_sentence

Returns a Python dictionary containing a word's distance from other words within sentences or phrases from given text returned object structure: {"word":[word's positions in each sentence ]}

each target word's absoloute distance against each word in a sentence is returned

def get_word_distances_per_sentence(text_to_parse, target_word, sentence_or_phrase, use_stopwords=1)
  • text_to_parse => string
  • sentence_or_phrase => "sentence" or "phrase" # this will further parse down sentances by , : ; breaking things into phrases as the basic unit
  • target_word => string
  • use_stopword => bool

Example Useage:

# get example data
import urllib2, word_context_distance
url = "http://www.gutenberg.org/files/2554/2554.txt"
response = urllib2.urlopen(url)
raw = response.read().decode('utf8')

# get the distance data
print word_context_distance.get_word_distances_per_sentence(raw[:111175], "sorrow", "sentence")

# the result should be
'''''''''
{u'heart': [5], u'full': [2], u'trifles': [8], u'would': [18], u'perhaps': [24], u'allow': [16], u'could': [26, 2], u'dounia': [19], u'it;': [15], u'besides': [20], u'letter': [10], u'ruin': [23], u'fill': [13]}
  • Utility functions for processing the result:

def get_avg_distance(distance_dict, allow_solo_words = 1)

def get_max_distance(distance_dict, allow_solo_words = 1)

def get_min_distance(distance_dict, allow_solo_words = 1)

def get_within_distance(distance_dict, max_distance, min_distance = 1, allow_solo_words = 1)

About

Returns a word's distance from other words within sentences or phrases from given text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages