- Element: Models either directories or webpages.
- WebSite: Represents a website and provides methods for managing its structure.
WebSite(host)
: Creates a new WebSite object for saving the website hosted athost
.getHomePage()
: Returns the home page of the website.getSiteString()
: Returns a string showing the structure of the website.insertPage(url, content)
: Saves and returns a new page of the website.getSiteFromPage(page)
: Given a page, returns the WebSite object it belongs to.
__hasDir(ndir, cdir)
: Checks if a directory exists in the current directory.__newDir(ndir, cdir)
: Creates a new directory if it doesn't exist.__hasPage(npag, cdir)
: Checks if a webpage exists in the current directory.__newPage(npag, cdir)
: Creates a new webpage if it doesn't exist.__isDir(elem)
: Checks if an element is a directory.__isPage(elem)
: Checks if an element is a webpage.
- InvertedIndex: Represents the core data structure of the search engine.
InvertedIndex()
: Creates a new empty InvertedIndex.addWord(keyword)
: Adds a keyword to the InvertedIndex.addPage(page)
: Processes a webpage and updates the inverted index.getList(keyword)
: Retrieves the occurrence list for a given keyword.
SearchEngine(namedir)
: Initializes the SearchEngine with a directory containing webpage files.search(keyword, k)
: Searches for the top k web pages with the maximum occurrences of the keyword.
- Constant time complexity for various operations.
- Linear time complexity for generating site structure.
- Logarithmic time complexity for directory and page existence checks.
- Linear time complexity for adding keywords and retrieving occurrence lists.
- The implementation aims to optimize efficiency for website organization and search queries.
- A test dataset is provided for evaluating the correctness and performance of the code.