-
Notifications
You must be signed in to change notification settings - Fork 0
AkshatShukla/Information-Retrieval-System
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Prerequisites to run this project: 1) Python Environment 2) Java IDE like Eclipse to import entire Lucene Folders as Java Projects Steps for project execution: NOTE: Execute each file in the exact order as mentioned below. Execute python file by: > python file_name.py Only Lucene model is implemented in Java. To execute it, just import the entire project into Eclipse > Run project as Java Application 1) Phase 1/Task 1/Step 4/Lucene/src/Lucene.java 2) Phase 1/Task 3/Part A/Step 4/Lucene (Stopped)/src/Lucene.java 3) Phase 1/Task 3/Part B/Step 3/Lucene (Stemmed)/src/Lucene.java Execute each file in the exact order as mentioned below. Input and output fields for every file has been mentioned for better understanding. ----------------------------------------------------------------------------------------------------------------------------- PHASE 1: TASK 1 Steps: 1. Execute Phase 1/Task 1/Step 1/tokenizer.py I/P: Raw HTML dir O/P: Phase 1/Task 1/Step 1/Tokenizer Output/ 2. Execute Phase 1/Task 1/Step 2/create_inverted_list.py - I/P: Phase 1/Task 1/Step 1/Tokenizer Output/ - O/P: Phase 1/Task 1/Setp 2/Inverted_List.txt - O/P: Phase 1/Task 1/Step 2/DocumentID_DocLen.txt - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Inverted_List.txt - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-DocumentID_DocLen.txt 3. Execute Phase 1/Task 1/Step 3/query_cleaning.py - I/P: Phase 1/Task 1/Step 3/cacm.query.txt - O/P: Phase 1/Task 1/Step 3/Cleaned_Queries.txt - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt 4. Execute the 4 algorithms of ranking (BM25, Lucene, TFIDF, Query Likelihood Model) A. a. Execute Phase 1/Task 1/Step 4/BM25/bm25_no_relevance.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Inverted_List.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-DocumentID_DocLen.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - O/P: Phase 1/Task 1/Step 4/BM25/BM25_NonRelevance_Top5_Docs.txt - O/P: Phase 1/Task 1/Step 4/BM25/BM25_NonRelevance_Top5_Query_Pages.txt - O/P: Phase 1/Task 1/Step 4/BM25/BM25_NonRelevance_Top100_Pages.txt - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-NoRelevance-Top100Docs-perQuery/ - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25_NoRelevance_Top5_Query_Pages.txt b. Execute Phase 1/Task 1/Step 4/BM25/rel_doc.py - I/P: cacm.rel.txt - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QueryID_RelevantDocs.txt c. Execute Phase 1/Task 1/Step 4/BM25/bm25_relevance.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Inverted_List.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-DocumentID_DocLen.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QueryID_RelevantDocs.txt - O/P: Phase 1/Task 1/Step 4/BM25/BM25_Relevance_Top5_Docs.txt - O/P: Phase 1/Task 1/Step 4/BM25/BM25_Relevance_Top5_Query_Pages.txt - O/P: Phase 1/Task 1/Step 4/BM25/BM25_Relevance_Top100_Pages.txt - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-Relevance-Top100Docs-perQuery/ - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-Relevance-Top5Docs-perQuery/ - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25_Relevance_Top5_Query_Pages.txt B. Executee Phase 1/Task 1/Step 4/TF-IDF/tf-idf_normalized.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Inverted_List.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-DocumentID_DocLen.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - O/P: Phase 1/Task 1/Step 4/TF-IDF/TF_IDF_Normalized_Top5_Docs.txt - O/P: Phase 1/Task 1/Step 4/TF-IDF/TF_IDF_Normalized_Top5_Query_Pages.txt - O/P: Phase 1/Task 1/Step 4/TF-IDF/TF_IDF_Normalized_Top100_Pages.txt - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-TF-IDF-Normalized-Top100Docs-perQuery/ - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-TF_IDF_Normalized_Top5_Query_Pages.txt C. Executee Phase 1/Task 1/Step 4/QLM/QLM.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Inverted_List.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-DocumentID_DocLen.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - O/P: Phase 1/Task 1/Step 4/QLM/QLM_Top5_Docs.txt - O/P: Phase 1/Task 1/Step 4/QLM/QLM_Top5_Query_Pages.txt - O/P: Phase 1/Task 1/Step 4/QLM/QLM_Top100_Pages.txt - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QLM-Top100Docs-perQuery/ - O/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QLM_Top5_Query_Pages.txt D. Execute Phase 1/Task 1/Step 4/Lucene/src/Lucene.java Task 2 Steps: 1. Execute generate_QueryID_Top5Docs_Dictionary.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-Relevance-Top5Docs-perQuery/ - O/P: Phase 1/Task 2/Encoded Data Structures (PRF)/Encoded-QueryID_Top5Docs_BM25_Relevance.txt 2. Execute creating_inv_list_for_top_5.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - I/P: Phase 1/Task 2/common_words.txt - I/P: Phase 1/Task 2/Encoded Data Structures (PRF)/Encoded-QueryID_Top5Docs_BM25_Relevance.txt - I/P: Phase 1/Task 1/Step 1/Tokenizer Output/ - O/P: Phase 1/Task 2/Encoded Data Structures (PRF)/Encoded-Queries_With_Their_Expansion_Terms.txt 3. Execute create_expanded_queries.py - I/P: Phase 1/Task 2/Encoded Data Structures (PRF)/Encoded-Queries_With_Their_Expansion_Terms.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QueryID_RelevantDocs.txt - O/P: Phase 1/Task 2/Encoded Data Structures (PRF)/Encoded-Expanded_Queries.txt 4. Execute bm25_Relevance_PRF.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Inverted_List.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-DocumentID_DocLen.txt - I/P: Phase 1/Task 2/Encoded Data Structures (PRF)/Encoded-Expanded_Queries.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QueryID_RelevantDocs.txt - I/P: Phase 1/Task 1/Step 1/Tokenizer Output/ - O/P: Phase 1/Task 2/Step 4/BM25_Relevance_PRF_Top100_Pages.txt - O/P: Phase 1/Task 2/Encoded Data Structures (PRF)/Encoded-BM25-Relevance-PRF-Top100Docs-perQuery/ TASK 3 Steps: Part A (Stopping): Steps: 1. Exectute Phase 1/Task 3/Part A/Step 1/tokenizer_with_stopping.py - I/P: Raw HTML dir - I/P: Phase 1/Task 3/Part A/common_words.txt - O/P: Phase 1/Task 3/Part A/Step 1/Stopped Tokenizer Output/ 2. Execute Phase 1/Task 3/Step 2/create_stopped_inverted_list.py - I/P: Phase 1/Task 3/Part A/Step 1/Stopped Tokenizer Output/ - O/P: Phase 1/Task 3/Part A/Step 2/Stopped_Inverted_List.txt - O/P: Phase 1/Task 3/Part A/Step 2/Stopped_DocumentID_DocLen.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_Inverted_List.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_DocumentID_DocLen.txt 3. Execute Phase 1/Task 3/Step 3/query_cleaning_stopwords_removed.py - I/P: Phase 1/Task 3/Part A/common_words.txt - I/P: Phase 1/Task 3/Part A/Step 3/cacm.query.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Cleaned_Queries_Stopped.txt - O/P: Phase 1/Task 3/Part A/Step 3/Cleaned_Queries_Stopped.txt 4. Part A (BM25 (Stopped)) 1. Execute Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/bm25_no_relevance_stopping.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_DocumentID_DocLen.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Cleaned_Queries_Stopped.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/Stopped_BM25_NoRelevance_Top5_Docs.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/Stopped_BM25_NoRelevance_Top5_Query_Pages.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/Stopped_BM25_NoRelevance_Top100_Pages.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_BM25-NoRelevance-Top100Docs-perQuery/ - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_BM25_NoRelevance_Top5_Query_Pages.txt 2. Execute Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/rel-doc-stopped.py - I/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/cacm.rel.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_QueryID_RelevantDocs.txt 3. Execute Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/bm25_relevance_stopping.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_DocumentID_DocLen.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Cleaned_Queries_Stopped.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_QueryID_RelevantDocs.txt - I/P: Phase 1/Task 3/Part A/Step 1/Stopped Tokenizer Output/ - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/Stopped_BM25_Relevance_Top5_Docs.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/Stopped_BM25_Relevance_Top5_Query_Pages.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/Stopped_BM25_Relevance_Top100_Pages.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_BM25-Relevance-Top100Docs-perQuery/ - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_BM25_Relevance_Top5_Query_Pages.txt Part B (QLM (Stopped)) Execute Phase 1/Task 3/Part A/Step 4/QLM (Stopped)/qlm_stopping.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_DocumentID_DocLen.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Cleaned_Queries_Stopped.txt - O/P: Phase 1/Task 3/Part A/Step 4/QLM (Stopped)/Stopped_QLM_Top5_Docs.txt - O/P: Phase 1/Task 3/Part A/Step 4/QLM (Stopped)/Stopped_QLM_Top5_Query_Pages.txt - O/P: Phase 1/Task 3/Part A/Step 4/QLM (Stopped)/Stopped_QLM_Top100_Pages.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_QLM-Top100Docs-perQuery/ - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_QLM_Top5_Query_Pages.txt Part C (TF-IDF (Stopped)) Execute Phase 1/Task 3/Part A/Step 4/TF-IDF (Stopped)/tf-idf_normalized_stopping.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_DocumentID_DocLen.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Cleaned_Queries_Stopped.txt - O/P: Phase 1/Task 3/Part A/Step 4/TF-IDF (Stopped)/Stopped_TF-IDF_Normalized_Top5_Docs.txt - O/P: Phase 1/Task 3/Part A/Step 4/TF-IDF (Stopped)/Stopped_TF-IDF_Normalized_Top5_Query_Pages.txt - O/P: Phase 1/Task 3/Part A/Step 4/TF-IDF (Stopped)/Stopped_TF-IDF_Normalized_Top100_Pages.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_TF-IDF-Normalized-Top100Docs-perQuery/ - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_TF-IDF_Normalized_Top5_Query_Pages.txt Part D (Lucene) Execute Phase 1/Task 3/Part A/Step 4/Lucene (Stopped)/src/Lucene.java Task 3 Part B (Stemming): Steps: 1. Execute Phase 1/Task 3/Part B/Step 1/cacm_stem_extracter.py - I/P: Phase 1/Task 3/Part B/Step 1/cacm_stem.txt - O/P: Phase 1/Task 3/Part B/Step 1/Stemmed_Corpus/ 2. Execute Phase 1/Task 3/Part B/Step 2/create_stemmed_inverted_list.py - I/P: Phase 1/Task 3/Part B/Step 1/Stemmed_Corpus/ - O/P: Phase 1/Task 3/Part B/Step 2/Stemmed_Inverted_List.txt - O/P: Phase 1/Task 3/Part B/Step 2/Stemmed_DocumentID_DocLen.txt - O/P: Phase 1/Task 3/Part B/Encoded Data Structures (Stemmed)/Encoded-Stemmed_Inverted_List.txt - O/P: Phase 1/Task 3/Part B/Encoded Data Structures (Stemmed)/Encoded-Stemmed_DocumentID_DocLen.txt 3. (BM25 (Stemmed)) 1. Execute Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/bm25_no_relevance_stemming.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_DocumentID_DocLen.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Cleaned_Queries_Stemmed.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/Stemmed_BM25_NoRelevance_Top5_Docs.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/Stemmed_BM25_NoRelevance_Top5_Query_Pages.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/Stemmed_BM25_NoRelevance_Top100_Pages.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_BM25-NoRelevance-Top100Docs-perQuery/ - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_BM25_NoRelevance_Top5_Query_Pages.txt 2. Execute Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/rel-doc-stemmed.py - I/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/cacm.rel.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_QueryID_RelevantDocs.txt 3. Execute Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/bm25_relevance_stemming.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_DocumentID_DocLen.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Cleaned_Queries_Stemmed.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_QueryID_RelevantDocs.txt - I/P: Phase 1/Task 3/Part A/Step 1/Stemmed Tokenizer Output/ - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/Stemmed_BM25_Relevance_Top5_Docs.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/Stemmed_BM25_Relevance_Top5_Query_Pages.txt - O/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stemmed)/Stemmed_BM25_Relevance_Top100_Pages.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_BM25-Relevance-Top100Docs-perQuery/ - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_BM25_Relevance_Top5_Query_Pages.txt Part B (QLM (Stemmed)) Execute Phase 1/Task 3/Part A/Step 4/QLM (Stemmed)/qlm_stemming.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_DocumentID_DocLen.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Cleaned_Queries_Stemmed.txt - O/P: Phase 1/Task 3/Part A/Step 4/QLM (Stemmed)/Stemmed_QLM_Top5_Docs.txt - O/P: Phase 1/Task 3/Part A/Step 4/QLM (Stemmed)/Stemmed_QLM_Top5_Query_Pages.txt - O/P: Phase 1/Task 3/Part A/Step 4/QLM (Stemmed)/Stemmed_QLM_Top100_Pages.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_QLM-Top100Docs-perQuery/ - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_QLM_Top5_Query_Pages.txt Part C (TF-IDF (Stemmed)) Execute Phase 1/Task 3/Part A/Step 4/TF-IDF (Stemmed)/tf-idf_normalized_stemming.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_DocumentID_DocLen.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Cleaned_Queries_Stemmed.txt - O/P: Phase 1/Task 3/Part A/Step 4/TF-IDF (Stemmed)/Stemmed_TF-IDF_Normalized_Top5_Docs.txt - O/P: Phase 1/Task 3/Part A/Step 4/TF-IDF (Stemmed)/Stemmed_TF-IDF_Normalized_Top5_Query_Pages.txt - O/P: Phase 1/Task 3/Part A/Step 4/TF-IDF (Stemmed)/Stemmed_TF-IDF_Normalized_Top100_Pages.txt - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_TF-IDF-Normalized-Top100Docs-perQuery/ - O/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stemmed)/Encoded-Stemmed_TF-IDF_Normalized_Top5_Query_Pages.txt Part D (Lucene) Execute Phase 1/Task 3/Part A/Step 4/Lucene (Stemmed)/src/Lucene.java ----------------------------------------------------------------------------------------------------------------------------- PHASE 2: Exectute Phase 2/Snippet_generation.py - I/P: Raw HTML dir - I/P: Phase 2/common_word.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_Inverted_List.txt - I/P: Phase 1/Task 3/Part A/Step 4/Lucene (Stopped)/Stopped_Lucene_Top5_Docs.txt - I/P: Phase 1/Task 3/Part A/Step 4/BM25 (Stopped)/Stopped_BM25_NoRelevance_Top5_Docs.txt - I/P: Phase 1/Task 3/Part A/Step 4/QLM (Stopped)/Stopped_QLM_Top5_Docs.txt - I/P: Phase 1/Task 3/Part A/Step 4/TF-IDF (Stopped)/Stopped_TF_IDF_Normalized_Top5_Docs.txt - O/P: Phase 2/Snippets_Text/Snippets_Stopped_Lucene.txt - O/P: Phase 2/Snippets_Text/Snippets_Stopped_BM25_NoRelevance.txt - O/P: Phase 2/Snippets_Text/Snippets_Stopped_QLM.txt - O/P: Phase 2/Snippets_Text/Snippets_Stopped_TF_IDF_Normalized.txt - O/P: Phase 2/Snippets_HTML/Snippets_Stopped_Lucene.html - O/P: Phase 2/Snippets_HTML/Snippets_Stopped_BM25_NoRelevance.html - O/P: Phase 2/Snippets_HTML/Snippets_Stopped_QLM.html - O/P: Phase 2/Snippets_HTML/Snippets_Stopped_TF_IDF_Normalized.html ----------------------------------------------------------------------------------------------------------------------------- PHASE 3: Steps: 1. Part A Execute Phase 3/Step 1/BM25 (No Relevance)/generate_QueryID_Top100Docs_bm25_no_relevance.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-NoRelevance-Top100Docs-perQuery/ - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_BM25_NoRelevance.txt 1. Part B Execute Phase 3/Step 1/BM25 (Relevance)/generate_QueryID_Top100Docs_bm25_relevance.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-Relevance-Top100Docs-perQuery/ - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_BM25_Relevance.txt 1. Part C Execute Phase 3/Step 1/Lucene/generate_QueryID_Top100Docs_lucene.py - I/P: Phase 1/Task 1/Step 4/Lucene/Lucene_Top100_Docs.txt - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_Lucene.txt 1. Part D Execute Phase 3/Step 1/QLM/generate_QueryID_Top100Docs_QLM.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QLM-Top100Docs-perQuery/ - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_QLM.txt 1. Part E Execute Phase 3/Step 1/TF-IDF/generate_QueryID_Top100Docs_tf-idf_normalized.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-TF-IDF-Normalized-Top100Docs-perQuery/ - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_tf-idf_normalized.txt 1. Part F Execute Phase 3/Step 1/BM25 (Relevance with PRF)/generate_QueryID_Top100Docs_bm25_relevance_PRF.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-Relevance-PRF-Top100Docs-perQuery/ - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_BM25_Relevance_PRF.txt 1. Part G Execute Phase 3/Step 1/BM25 (Relevance Stopped)/generate_QueryID_Top100Docs_bm25_relevance_stopped.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_BM25-Relevance-Top100Docs-perQuery - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_Stopped_BM25_Relevance.txt 1. Part H Execute Phase 3/Step 1/Lucene (Stopped)/generate_QueryID_Top100Docs_lucene_stopped.py - I/P: Phase 1/Task 3/Part A/Step 4/Lucene (Stopped)/Stopped_Lucene_Top100_Docs.txt - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_Stopped_Lucene.txt 1. Part I Execute Phase 3/Step 1/TF-IDF (Stopped)/generate_QueryID_Top100Docs_tf-idf_normalized_stopped.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_TF-IDF-Normalized-Top100Docs-perQuery - O/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_Stopped_tf-idf_normalized.txt 2. Part A Execute Phase 3/Step 2/BM25 (No Relevance)/retrieval_model_evaluation_bm25_no_relevance.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QueryID_RelevantDocs.txt - I/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_BM25_NoRelevance.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - O/P: Phase 3/Precision Recall Tables/BM25 Evaluation Results/BM25 (No Relevance)/ 2. Part B Execute Phase 3/Step 2/BM25 (Relevance)/retrieval_model_evaluation_bm25_relevance.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-Relevance-Top100Docs-perQuery/ - I/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_BM25_Relevance.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - O/P: Phase 3/Precision Recall Tables/BM25 Evaluation Results/BM25 (Relevance)/ 2. Part C Execute Phase 3/Step 2/Lucene/retrieval_model_evaluation_lucene.py - I/P: Phase 1/Task 1/Step 4/Lucene/Lucene_Top100_Docs.txt - I/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_Lucene.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - O/P: Phase 3/Precision Recall Tables/Lucene Evaluation Results/Lucene/ 2. Part D Execute Phase 3/Step 2/QLM/retrieval_model_evaluation_QLM.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-QLM-Top100Docs-perQuery/ - I/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_QLM.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - O/P: Phase 3/Precision Recall Tables/QLM Evaluation Results/QLM/ 2. Part E Execute Phase 3/Step 2/TF-IDF/retrieval_model_evaluation_tf-idf_normalized.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-TF-IDF-Normalized-Top100Docs-perQuery/ - I/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_tf-idf_normalized.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - O/P: Phase 3/Precision Recall Tables/TF-IDF Evaluation Results/TF-IDF/ 2. Part F Execute Phase 3/Step 1/BM25 (Relevance with PRF)/retrieval_model_evaluation_by_bm25_with_relevance_for_PRM_s_top_100.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-BM25-Relevance-Top100Docs-perQuery/ - I/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_BM25_Relevance_PRF.txt - I/P: Phase 1/Task 2/Encoded-Expanded_Queries.txt - O/P: Phase 3/Precision Recall Tables/BM25 (Relevance with PRF)/ 2. Part G Execute Phase 3/Step 1/BM25 (Relevance Stopped)/retrieval_model_evaluation_relevance_stopped.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_BM25-Relevance-Top100Docs-perQuery/ - I/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_Stopped_BM25_Relevance.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Cleaned_Queries_Stopped.txt - O/P: Phase 3/Precision Recall Tables/BM25 (Relevance Stopped)/ 2. Part H Execute Phase 3/Step 1/Lucene (Stopped)/retrieval_model_evaluation_lucene_stopped.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_QueryID_RelevantDocs.txt - I/P: Phase 3/Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_Stopped_Lucene.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Cleaned_Queries_Stopped.txt - O/P: Phase 3/Precision Recall Tables/Lucene (Stopped)/ 2. Part I Execute Phase 3/Step 1/TF-IDF (Stopped)/retrieval_model_evaluation_tf-idf_normalized_stopped.py - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Stopped_QueryID_RelevantDocs.txt - I/P: Encoded Data Structures (Phase 3)/Encoded-QueryID_Top100Docs_Stopped_tf-idf_normalized.txt - I/P: Phase 1/Task 3/Part A/Encoded Data Structures (Stopped)/Encoded-Cleaned_Queries_Stopped.txt - O/P: Phase 3/Precision Recall Tables/TF-IDF-Normalized (Stopped)/ ----------------------------------------------------------------------------------------------------------------------------- Supplementary Features: Part A (No Stopping): Execute Supplementary Features/Part A (No Stopping)/create_inverted_list_with_positions_no_stop.py - I/P: Phase 1/Task 1/Step 1/Tokenizer Output - O/P: Supplementary Features/Inverted_List_With_Positions_No_Stopping.txt - O/P: Supplementary Features/Encoded Data Structures (Bonus)/Encoded-Inverted_List_Position_No_Stopping.txt Execute Supplementary Features/Part A (No Stopping)/proximity_no_stop.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-DocumentID_DocLen.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - I/P: Supplementary Features/Encoded Data Structures (Bonus)/Encoded-Inverted_List_Position_No_Stopping.txt - O/P: Supplementary Features/Encoded Data Structures (Bonus)/Encoded-Top100-Docs-Proximity-NoStopping/ - O/P: Supplementary Features/Part A (No Stopping)/Proximity_NoStopping_Top100_Pages.txt Part B (Stopping): Execute Supplementary Features/Part B (Stopping)/create_inverted_list_with_positions_stop.py - I/P: Phase 1/Task 3/Part A/Step 1/Stopped Tokenizer Output - I/P: Phase 1/Task 3/Part A/common_words.txt - O/P: Supplementary Features/Inverted_List_With_Positions_Stopping.txt - O/P: Supplementary Features/Encoded Data Structures (Bonus)/Encoded-Inverted_List_Position_With_Stopping.txt Execute Supplementary Features/Part B (Stopping)/proximity_with_stop.py - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-DocumentID_DocLen.txt - I/P: Phase 1/Task 1/Encoded Data Structures/Encoded-Cleaned_Queries.txt - I/P: Supplementary Features/Encoded Data Structures (Bonus)/Encoded-Inverted_List_Position_With_Stopping.txt - O/P: Supplementary Features/Encoded Data Structures (Bonus)/Encoded-Top100-Docs-Proximity-With-Stopping/ - O/P: Supplementary Features/Part B (Stopping)/Proximity_With_Stopping_Top100_Pages.txt
About
Comparative study of information retrieval systems based on retrieval effectiveness on TREC CACM data
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published