Skip to content

Latest commit

 

History

History
107 lines (93 loc) · 2.88 KB

1138.md

File metadata and controls

107 lines (93 loc) · 2.88 KB

题目:根据 1.1.10.4 节给出的暴力查找法编写一个程序 BruteForSearch,在你的计算机上比较它和 BinarySearch 处理 largeW.text 和 largeT.text 所需要的时间。

附表 1.1.10.4 的查找程序:

def rank(key, a):
    for i in range(0, len(a)):
        if key == a[i]:
            return i
    return -1

解答:先用 urllib3 下载题目中所需要的 largeText.txt 文件,体积90M左右,largeAllowlist.text,7M 左右。

def downloadLargetText(self):
	url = 'https://algs4.cs.princeton.edu/11model/largeText.txt'
    http = urllib3.PoolManager()
    resp = http.request('GET', url, preload_content=False) 
    with open('/Users/hexintao/Desktop/n/largeText.txt', 'wb') as out:
        while True:
            data = resp.read(32)
            if not data:
                break
            out.write(data)
    resp.release_conn()

然后通过 open 方法加载本地文件并转化为数组,模拟从命令行中读取。

def readList(self, isAllowlist):
    path = ''
    if isAllowlist:
        path = '/Users/hexintao/Desktop/n/largeAllowlist.txt'
    else:
        path = '/Users/hexintao/Desktop/n/largeText.txt'
    with open(path, 'r') as f:
        data = f.readlines()  #txt中所有字符串读入data
        i = -1
    for line in data:
        i += 1
        temp = line.split('\n')
        odom = temp[0]
        data[i] = int(odom)
    return data

查找的程序为:

def SearchTest(self, key, a):
	# 二分查找
    def binarySearch(k, a):
        lo = 0
        hi = max([len(a)-1, 0])
        while lo <= hi:
            mid = lo + (hi-lo)//2
            if a[mid] < k:
                lo = mid+1
            elif a[mid] > k:
                hi = mid-1
            else:
                return mid
        return -1
	# 暴力查找
    def rank(k, a):
        for i in range(0, len(a)):
            if k == a[i]:
                return i
        return -1
      
    for i in range(0, len(key)):
        res = binarySearch(key[i], a)
        # 对比的话,取消下一行注释即可
        # res = rank(key[i], a)
        if res < 0:
            print(f'{key[i]}')

最后执行的主程序:

e = Exercise()
# e.downloadLargetText()
allowList = e.readList(True)
largeList = e.readList(False)
allowList.sort()
startTime = datetime.datetime.now()
matchIdx = e.BruteForSearch(largeList, allowList)
endTime = datetime.datetime.now()
dur = (endTime - startTime).seconds

其中,allowListlargeList 的长度分别为:

len(allowList)
1000000
len(largeList)
10000000

最后运行的结果:在我的电脑,2019款13寸 MackbookPro 2.8GHz i7 处理器,暴力查找运行了三个多小时都没有结束,实在等不下去手动停止了,而二分查找耗时455s左右。

个人遇到的技术难点有:

  1. python 下载大文件
  2. 读取文件并转换