背景
需要通过r.jina.ai解析网页或pdf文件送到下游的LLM进行分析处理,近期r.jina.ai的响应越来越慢,经常出现404错误,计划本地进行处理
部署
https://github.com/intergalacticalvariable/reader
docker pull ghcr.io/intergalacticalvariable/reader:latest
docker run -d -p 3000:3000 -v /path/to/local-storage:/app/local-storage --name reader-container ghcr.io/intergalacticalvariable/reader:latest
docker-compose.yml
version: '3'
services:
reader-service:
image: ghcr.io/intergalacticalvariable/reader:latest
container_name: r_jina_ai
ports:
- "5005:3000"
volumes:
- /data/docker_data/r_jina_ai:/app/local-storage
restart: unless-stopped
使用
Once the Docker container is running, you can use curl to make requests. Here are examples for different response types:
- 📝 Markdown (bypasses readability processing):
curl -H "X-Respond-With: markdown" 'https://jina.0ms.net/https://google.com'
- 🌐 HTML (returns documentElement.outerHTML):
curl -H "X-Respond-With: html" 'https://jina.0ms.net/https://google.com'
- 📄 Text (returns document.body.innerText):
curl -H "X-Respond-With: text" 'https://jina.0ms.net/https://google.com'
- 📸 Screen-Size Screenshot (returns the URL of the webpage’s screenshot):
curl -H "X-Respond-With: screenshot" 'https://jina.0ms.net/https://google.com'
- 📸 Full-Page Screenshot (returns the URL of the webpage’s screenshot):
curl -H "X-Respond-With: pageshot" 'https://jina.0ms.net/https://google.com'