题目:根据 1.1.10.4 节给出的暴力查找法编写一个程序 BruteForSearch,在你的计算机上比较它和 BinarySearch 处理 largeW.text 和 largeT.text 所需要的时间。
附表 1.1.10.4 的查找程序:
def rank(key, a):
for i in range(0, len(a)):
if key == a[i]:
return i
return -1
解答:先用 urllib3
下载题目中所需要的 largeText.txt 文件,体积90M左右,largeAllowlist.text,7M 左右。
def downloadLargetText(self):
url = 'https://algs4.cs.princeton.edu/11model/largeText.txt'
http = urllib3.PoolManager()
resp = http.request('GET', url, preload_content=False)
with open('/Users/hexintao/Desktop/n/largeText.txt', 'wb') as out:
while True:
data = resp.read(32)
if not data:
break
out.write(data)
resp.release_conn()
然后通过 open
方法加载本地文件并转化为数组,模拟从命令行中读取。
def readList(self, isAllowlist):
path = ''
if isAllowlist:
path = '/Users/hexintao/Desktop/n/largeAllowlist.txt'
else:
path = '/Users/hexintao/Desktop/n/largeText.txt'
with open(path, 'r') as f:
data = f.readlines() #txt中所有字符串读入data
i = -1
for line in data:
i += 1
temp = line.split('\n')
odom = temp[0]
data[i] = int(odom)
return data
查找的程序为:
def SearchTest(self, key, a):
# 二分查找
def binarySearch(k, a):
lo = 0
hi = max([len(a)-1, 0])
while lo <= hi:
mid = lo + (hi-lo)//2
if a[mid] < k:
lo = mid+1
elif a[mid] > k:
hi = mid-1
else:
return mid
return -1
# 暴力查找
def rank(k, a):
for i in range(0, len(a)):
if k == a[i]:
return i
return -1
for i in range(0, len(key)):
res = binarySearch(key[i], a)
# 对比的话,取消下一行注释即可
# res = rank(key[i], a)
if res < 0:
print(f'{key[i]}')
最后执行的主程序:
e = Exercise()
# e.downloadLargetText()
allowList = e.readList(True)
largeList = e.readList(False)
allowList.sort()
startTime = datetime.datetime.now()
matchIdx = e.BruteForSearch(largeList, allowList)
endTime = datetime.datetime.now()
dur = (endTime - startTime).seconds
其中,allowList
和 largeList
的长度分别为:
len(allowList)
1000000
len(largeList)
10000000
最后运行的结果:在我的电脑,2019款13寸 MackbookPro 2.8GHz i7 处理器,暴力查找运行了三个多小时都没有结束,实在等不下去手动停止了,而二分查找耗时455s
左右。
个人遇到的技术难点有:
- python 下载大文件
- 读取文件并转换