-
Notifications
You must be signed in to change notification settings - Fork 1
/
Readme.txt
111 lines (102 loc) · 3.17 KB
/
Readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# Search Engine using vector space model
- The code is splitted into three different pythons files to make it readable:
1) Prepro.py: this file has a function that will preprocess the input to it.
2) out.py: This file has the output fucntion that will give the output as in the sample output.
3) driver.py: This file has the drive code that performs the calculation for TF, IDF, TFIDF and Cosine Similarity.
Note: comments are give in each file to indcate how the process is carried out.
- For both queries and data, Preprocessing and Tokenization is done and corrospondingly TF, IDF and TFIDF dictionaries are created
using which we further calculate the document length and cosine similarity. Finally in the output function we calculate, Precision
and recall for each query and average precision and recall for these operations.
- Following the pricision and recall of the method:
FOR 10 MOST RELEVANT DOCUMENTS
Query 1
Precision 0.0 Recall 0.0
Query 2
Precision 0.2 Recall 0.13333333333333333
Query 3
Precision 0.2 Recall 0.13333333333333333
Query 4
Precision 0.1 Recall 0.05555555555555555
Query 5
Precision 0.2 Recall 0.10526315789473684
Query 6
Precision 0.4 Recall 0.2222222222222222
Query 7
Precision 0.6 Recall 0.6666666666666666
Query 8
Precision 0.2 Recall 0.5
Query 9
Precision 0.2 Recall 0.25
Query 10
Precision 0.2 Recall 0.08333333333333333
Average precision 0.23000000000000004
Average recall 0.21497076023391815
FOR 50 MOST RELEVANT DOCUMENTS
Query 1
Precision 0.0 Recall 0.0
Query 2
Precision 0.08 Recall 0.26666666666666666
Query 3
Precision 0.12 Recall 0.4
Query 4
Precision 0.04 Recall 0.1111111111111111
Query 5
Precision 0.2 Recall 0.5263157894736842
Query 6
Precision 0.12 Recall 0.3333333333333333
Query 7
Precision 0.16 Recall 0.8888888888888888
Query 8
Precision 0.06 Recall 0.75
Query 9
Precision 0.1 Recall 0.625
Query 10
Precision 0.08 Recall 0.16666666666666666
Average precision 0.096
Average recall 0.4067982456140351
FOR 100 MOST RELEVANT DOCUMENTS
Query 1
Precision 0.0 Recall 0.0
Query 2
Precision 0.1 Recall 0.6666666666666666
Query 3
Precision 0.09 Recall 0.6
Query 4
Precision 0.06 Recall 0.3333333333333333
Query 5
Precision 0.13 Recall 0.6842105263157895
Query 6
Precision 0.08 Recall 0.4444444444444444
Query 7
Precision 0.08 Recall 0.8888888888888888
Query 8
Precision 0.03 Recall 0.75
Query 9
Precision 0.06 Recall 0.75
Query 10
Precision 0.04 Recall 0.16666666666666666
Average precision 0.06700000000000002
Average recall 0.528421052631579
FOR 500 MOST RELEVANT DOCUMENTS
Query 1
Precision 0.002 Recall 1.0
Query 2
Precision 0.03 Recall 1.0
Query 3
Precision 0.03 Recall 1.0
Query 4
Precision 0.032 Recall 0.8888888888888888
Query 5
Precision 0.038 Recall 1.0
Query 6
Precision 0.032 Recall 0.8888888888888888
Query 7
Precision 0.018 Recall 1.0
Query 8
Precision 0.008 Recall 1.0
Query 9
Precision 0.016 Recall 1.0
Query 10
Precision 0.024 Recall 0.5
Average precision 0.023
Average recall 0.9277777777777778