Skip to content

ArXiv MCP Server: Add enhanced LLM-friendly tool descriptions and restored blazickjp functionality #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jasonleinart
Copy link
Contributor


name: Add a new MCP server
about: Requests for adding a new MCP server to the Docker Catalog
title: ""
labels: submission
assignees: ""

Summary

This PR builds upon the previous merge (#170) by adding
the complete restored functionality from the original
blazickjp repository, addressing the core issue
identified in feedback where LLMs would skip using the
ArXiv tools.

What's New in This Update

🔧 Restored Core Functionality

  • VALID_CATEGORIES validation system - Complete
    arXiv category validation
  • Enhanced helper functions -
    _validate_categories(), _optimize_query(),
    _build_date_filter()
  • Advanced sorting options - Sort by relevance or
    date
  • Comprehensive error handling - Improved logging
    and error messages

🧠 LLM-Friendly Tool Descriptions

The enhanced search tool now includes detailed
documentation that LLMs can understand:

description: |
  QUERY CONSTRUCTION GUIDELINES:
  - Use QUOTED PHRASES for exact matches: "multi-agent 
systems", "neural networks" 
  - Field-specific searches: ti:"exact title phrase", 
au:"author name"
  - Advanced patterns: au:"Hinton" AND "deep learning" 
with categories: ["cs.LG"]
  
  EXAMPLES OF EFFECTIVE QUERIES:
  - ti:"reinforcement learning" with categories: 
["cs.LG", "cs.AI"]
  - "multi-agent" ANDNOT "survey" with categories: 
["cs.MA"]

Problem Solved

Addresses the specific feedback issue:
"LLMs never try to use the tools in this server, because
 they cannot understand what they do or how to use them.
 When I ask an AI to 'search ArXiv for recent AI 
advancements' it skips directly to searching 
DuckDuckGo."

Technical Changes

- Source repository: Points to enhanced
jasonleinart/arxiv-mcp-server with restored blazickjp
functionality
- Enhanced descriptions: 2,400+ character tool
descriptions with concrete examples
- Category validation: Complete arXiv category prefix
validation
- Query optimization: Intelligent query processing while
 preserving user intent

Testing & Verification

- ✅ Production tested: Actual paper downloads and
processing verified
- ✅ Docker container: Full functionality tested in
production environment
- ✅ LLM integration: Tool descriptions verified to
guide LLM usage effectively
- ✅ Local catalog: Successfully generated and validated

Impact

This update ensures LLMs will actively choose and
effectively use the ArXiv tools instead of falling back
to web search, providing researchers with proper
academic paper discovery and analysis capabilities.

jasonleinart and others added 8 commits July 10, 2025 09:28
Features:
• Search arXiv papers with advanced filtering (date, category, query)
• Download and store papers locally as markdown
• Read and analyze paper content programmatically
• Deep research analysis prompts for comprehensive paper review
• Configurable local storage path for paper management

Technical Details:
- Category: search (academic research focused)
- License: Apache 2.0 (compatible with Docker registry)
- Source: https://github.com/jasonleinart/arxiv-mcp-server
- Build Status: Successfully tested and validated
- Docker Image: mcp/arxiv-mcp-server

Target Users:
Perfect for researchers, academics, and AI assistants conducting
literature reviews, research analysis, and academic paper exploration.

Differentiates from existing paper-search server by providing:
- ArXiv-specific optimizations and direct API integration
- Local paper storage and management capabilities
- Research-focused analysis prompts and workflows
- Advanced filtering and categorization options
- Maps user storage path to /app/papers inside container
- Enables file persistence for downloaded papers
- Fixes ephemeral container issue where files weren't saved
- Follows pattern used by kubectl-mcp-server and elevenlabs
- Essential for Docker MCP Toolkit volume mounting
- Change from parameter template to environment variable approach
- Docker MCP Toolkit processes  but not {{parameter}} templates
- Maps storage_path parameter to STORAGE_PATH env var for volume mounting
- Should resolve volume mounting issue where files weren't persisting
- Change volume mapping from $STORAGE_PATH to {{arxiv-mcp-server.storage_path}} template syntax
- Add ARXIV_STORAGE_PATH environment variable for internal container path
- Fixes volume mounting issue identified by @cmrigney in PR review
- Follows Docker MCP Registry configuration standards per docs/configuration.md
…nment variable

- Remove redundant STORAGE_PATH environment variable as suggested by @cmrigney
- Keep only ARXIV_STORAGE_PATH with '/app/papers' value for container internal path
- Simplifies configuration while maintaining full functionality
- Volume mounting handles path mapping: {{arxiv-mcp-server.storage_path}}:/app/papers
- ArXiv server auto-detects Docker environment and uses correct storage path
…models

- Add enhanced tool descriptions optimized for local AI model compatibility
- Improve Docker MCP Gateway integration with detailed context
- Address community feedback about sparse tool descriptions causing confusion
- Add tags for better discoverability: local-models, docker-gateway, enhanced-descriptions

This update resolves issues where local AI models (Llama, Mistral, etc.) were
getting confused by minimal tool descriptions when using Docker MCP Gateway.

Fixes: docker/mcp-gateway#93
…tionality

- Point source to jasonleinart/arxiv-mcp-server with enhanced search functionality
- Updated description to highlight LLM-friendly tool descriptions
- Restored comprehensive query construction guidelines and examples
- Added enhanced tool descriptions that resolve LLM tool usage issues
- Docker will build from enhanced repository while maintaining mcp/arxiv-mcp-server image

This addresses the feedback issue where LLMs would skip using the tools due to
generic descriptions. The enhanced version includes detailed examples and concrete
usage patterns that LLMs can understand and utilize effectively.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@jasonleinart jasonleinart requested a review from a team as a code owner August 20, 2025 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants