A Model Context Protocol (MCP) server providing specialized tools for AI agents, including advanced web scraping capabilities that bypass anti-bot detection mechanisms.
- Web Fetching: Playwright-based web scraper that handles bot detection
- Site-Specific Strategies: Optimized scraping for popular sites (Baeldung, Medium)
- Anti-Bot Detection: Advanced evasion techniques including user agent spoofing and JavaScript injection
- Extensible Architecture: Easy to add new tools and strategies
- Docker Ready: Containerized deployment for easy integration
- Docker (recommended) or Node.js 18+
- Claude Desktop or any MCP-compatible client
-
Build the Docker image:
docker build -t mcp-tools-server:latest .
-
Add to Claude Desktop:
claude mcp add-json mcp-tools-server '{"command": "docker", "args": ["run", "--rm", "-i", "mcp-tools-server:latest"]}'
-
Clone and install dependencies:
git clone <repository-url> cd mcp-tools-server npm install
-
Build the project:
npm run build
-
Add to Claude Desktop:
claude mcp add-json mcp-tools-server '{"command": "node", "args": ["dist/index.js"], "cwd": "/path/to/mcp-tools-server"}'
Fetches web content using Playwright to bypass anti-bot measures and extract clean text content.
Parameters:
url
(required): The URL to fetchtimeout
(optional): Timeout in milliseconds (default: 30000)userAgent
(optional): Custom user agent stringwaitForSelector
(optional): CSS selector to wait for before extracting content
Example Usage:
{
"tool": "web-fetcher",
"arguments": {
"url": "https://www.baeldung.com/java-collections",
"timeout": 45000,
"waitForSelector": ".post-content"
}
}
Supported Sites with Enhanced Strategies:
- Baeldung.com: Optimized content extraction with code block formatting
- Medium.com: Article-specific scraping with clean text extraction
- General Sites: Fallback strategy for any website
src/
βββ index.ts # MCP server entry point
βββ tools/
β βββ base/
β β βββ Tool.ts # Base tool interface
β βββ web-fetcher/
β βββ WebFetcher.ts # Main web fetcher implementation
β βββ strategies/ # Site-specific strategies
β βββ BaeldungStrategy.ts
β βββ MediumStrategy.ts
β βββ index.ts
βββ types/
β βββ index.ts # TypeScript type definitions
βββ config/ # Configuration files
βββ utils/ # Shared utilities
- Node.js 18+
- TypeScript
- Playwright
# Install dependencies
npm install
# Start development server with hot reload
npm run dev
# Run type checking
npm run type-check
# Build for production
npm run build
-
Create your tool class:
import { BaseTool } from '../base/Tool.js'; export class YourTool extends BaseTool { name = 'your-tool'; description = 'Description of your tool'; getSchema() { // Define MCP tool schema } async execute(args: any) { // Implement tool logic } }
-
Register in index.ts:
import { YourTool } from './tools/your-tool/YourTool.js'; const yourTool = new YourTool(); // Add to tools list and handlers
-
Implement the FetchStrategy interface:
import { FetchStrategy, WebFetchRequest, WebFetchResponse } from '../../types/index.js'; export class YourSiteStrategy implements FetchStrategy { canHandle(url: string): boolean { return url.includes('yoursite.com'); } async fetch(request: WebFetchRequest): Promise<WebFetchResponse> { // Site-specific implementation } }
-
Register the strategy:
const strategies = [ new BaeldungStrategy(), new MediumStrategy(), new YourSiteStrategy() // Add here ];
# Build the image
npm run docker:build
# Run locally
npm run docker:run
# Use docker-compose
npm run docker:compose:up
Docker images are automatically built and pushed to Docker Hub when PRs are merged to main via GitHub Actions.
Manual deployment:
# Tag and push to registry
npm run docker:push
To enable automatic Docker image publishing, configure the following secrets in your GitHub repository:
DOCKER_USERNAME
: Your Docker Hub usernameDOCKER_PASSWORD
: Your Docker Hub password or access token
Setting up secrets:
- Go to your GitHub repository β Settings β Secrets and variables β Actions
- Click "New repository secret"
- Add both
DOCKER_USERNAME
andDOCKER_PASSWORD
On PR Merge:
- Build multi-platform Docker images (AMD64 and ARM64)
- Tag images with branch name, commit SHA, and
latest
- Push to
{your-username}/mcp-tools-server
on Docker Hub
Manual Release (GitHub Actions β Run workflow):
- Version Bump: Choose
patch
,minor
, ormajor
to automatically incrementpackage.json
version - Use Package Version: Uses current
package.json
version for Docker tag (e.g.,v1.2.3
) - Tag as Latest: Optionally tag as
latest
in addition to version tag - GitHub Release: Automatically creates a GitHub release with Docker pull instructions
- Go to Actions β Build and Push Docker Image β Run workflow
- Select version bump type (
patch
for bug fixes,minor
for features,major
for breaking changes) - Choose whether to tag as
latest
- The workflow will:
- Bump
package.json
version - Commit and push the version change
- Build and tag Docker image with new version
- Create GitHub release with:
- Docker pull command for the specific version
- Claude MCP installation command with version-pinned Docker image
- Links to Docker Hub and installation guide
- Bump
- Non-root container execution for enhanced security
- Anti-bot detection evasion without malicious intent
- Sandboxed browser execution with security flags
- Minimal attack surface with multi-stage Docker builds
The server currently uses sensible defaults but can be extended with configuration files in the src/config/
directory for:
- Custom timeouts
- Default user agents
- Strategy-specific settings
- Rate limiting (planned)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Docker build fails:
- Ensure Docker has enough memory allocated (recommend 4GB+)
- Try building with
--no-cache
flag
Playwright crashes:
- Verify the container has sufficient memory
- Check if running in a sandboxed environment that blocks browser execution
MCP connection issues:
- Verify the tool is properly registered in Claude Desktop
- Check that the Docker container starts without errors
- Ensure the container can access the internet for web fetching
# Enable debug logging
DEBUG=* npm run dev
MIT License - see LICENSE file for details