CodeMaster: A DSA Problem Search Engine

Project Overview

CodeMaster is a powerful search engine specifically designed to process and rank a large collection of DSA problems efficiently. Leveraging cutting-edge text processing techniques such as TF-IDF and BM25, the system ensures precise document ranking and delivers the most relevant results for user queries. It also tackles edge cases to provide accurate and user-friendly search experiences.

Key Features

Efficient Search Mechanism: Uses TF-IDF and BM25 algorithms to rank documents by relevance to user queries.
Advanced Text Processing: Handles complex cases like spelling corrections, camel casing, lemmatization, and numeric equivalences.
Real-Time Ranking: Scores and ranks documents dynamically, prioritizing trusted sources and showing the top 10 results.

Workflow

1. Keyword Generation and Scoring

Traditional String Matching: Provides a basic approach but struggles with scalability.
Term Frequency (TF): Counts keyword occurrences but tends to favor longer documents.
Inverse Document Frequency (IDF): Adjusts scores by emphasizing rare terms across documents.
TF-IDF: Combines TF and IDF to determine the importance of keywords effectively.

2. Normalization and BM25 Scoring

Normalization: Accounts for document length to remove bias.
BM25: Balances keyword influence to avoid overemphasis on specific terms.

3. Text Processing and Query Handling

Preprocessing: Cleans query text by removing stopwords, punctuation, and applying lowercase transformation.
Keyword Extraction: Identifies relevant terms while ignoring unrelated content.
Edge Cases:
- Spelling corrections.
- Numeric conversions (e.g., "two" to "2").
- Splitting camel-cased words (e.g., "twoSum" to "two sum").
- Lemmatization for accurate word matching.

4. Search and Ranking

Query Processing: Extracts and processes query keywords, then computes TF-IDF scores.
Cosine Similarity: Measures the closeness of the query and documents.
Title Similarity: Adds weight to documents with titles that closely match the query.
Ranking: Applies BM25 for ranking and displays the top 10 results.

Edge Cases Handled

Stopword Removal: Eliminates common, irrelevant words (e.g., "is," "the").
Punctuation Removal: Avoids mismatches by stripping punctuation.
Spell Checking: Corrects misspelled query terms.
Lemmatization: Matches words by reducing them to their base forms (e.g., "running" to "run").
Number-Word Conversion: Treats numeric forms and their word equivalents as identical.
CamelCase Handling: Splits camel-cased words for better keyword matching.
Title Similarity: Considers document titles as part of the ranking process.

Performance Optimization

RAM-Based Indexing: Stores TF-IDF values in files and loads them into memory during startup, enabling faster searches and improved response times.

How to Use

Install Dependencies: Ensure all required packages and libraries are installed.
Start the Application: Launch the search engine and input your queries to retrieve ranked results.
Explore Results: View the top 10 most relevant documents based on your search.

Technologies Used

Backend: Node.js
Frontend: ejs template engine
Text Processing: TF-IDF, BM25
Storage: RAM-based indexing for fast access

Future Enhancements

Add support for more data sources and DSA platforms.
Optimize the scoring algorithms for even larger datasets.
Implement a user-friendly interface for better accessibility.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
problems		problems
public		public
views		views
.env		.env
.gitignore		.gitignore
IDF.txt		IDF.txt
README.md		README.md
TF.js		TF.js
TF.txt		TF.txt
app.js		app.js
idf.js		idf.js
keywords.js		keywords.js
keywords.txt		keywords.txt
length.js		length.js
length.txt		length.txt
package-lock.json		package-lock.json
package.json		package.json
problem-titles.txt		problem-titles.txt
problem-urls.txt		problem-urls.txt
tf-gen.js		tf-gen.js
titles.js		titles.js
urls.js		urls.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeMaster: A DSA Problem Search Engine

Project Overview

Key Features

Workflow

1. Keyword Generation and Scoring

2. Normalization and BM25 Scoring

3. Text Processing and Query Handling

4. Search and Ranking

Edge Cases Handled

Performance Optimization

How to Use

Technologies Used

Future Enhancements

About

Releases

Packages

Languages

Akshat-Somvanshi18/CodeCrux

Folders and files

Latest commit

History

Repository files navigation

CodeMaster: A DSA Problem Search Engine

Project Overview

Key Features

Workflow

1. Keyword Generation and Scoring

2. Normalization and BM25 Scoring

3. Text Processing and Query Handling

4. Search and Ranking

Edge Cases Handled

Performance Optimization

How to Use

Technologies Used

Future Enhancements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages