Spaces:

mycompanyajt
/

inference

Running

App Files Files Community

nurulajt commited on 23 days ago

Commit

8da8945

verified ·

1 Parent(s): e80adbc

Upload 4 files

Browse files

Files changed (4) hide show

Dockerfile +25 -0
README.md +277 -10
api.py +242 -0
requirements.txt +9 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,25 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements and install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY api.py .
+# Expose port (Hugging Face Spaces uses 7860 by default)
+EXPOSE 7860
+# Set environment variable for port
+ENV PORT=7860
+# Run the API server
+CMD ["python", "api.py"]

README.md CHANGED Viewed

@@ -1,10 +1,277 @@
----
-title: Inference
-emoji: 🚀
-colorFrom: blue
-colorTo: green
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Embedding Inference API
+A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI.
+## Features
+- **Multiple Models**: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art)
+- **RESTful API**: Easy-to-use HTTP endpoints
+- **Batch Processing**: Process multiple texts in a single request
+- **Task-Specific Embeddings**: Support for different embedding tasks (retrieval, classification, etc.)
+- **Docker Ready**: Easy deployment to Hugging Face Spaces or any Docker environment
+## Supported Models
+| Model | Dimension | Max Tokens | Best For |
+|-------|-----------|------------|----------|
+| JobBERT v2 | 768 | 512 | Job titles and descriptions |
+| JobBERT v3 | 768 | 512 | Job titles (improved performance) |
+| Jina AI v3 | 1024 | 8,192 | General text, long documents |
+| Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) |
+## Quick Start
+### Local Development
+1. **Install dependencies:**
+   ```bash
+   cd embedding
+   pip install -r requirements.txt
+   ```
+2. **Run the API:**
+   ```bash
+   python api.py
+   ```
+3. **Access the API:**
+   - API: http://localhost:7860
+   - Docs: http://localhost:7860/docs
+### Docker Deployment
+1. **Build the image:**
+   ```bash
+   docker build -t embedding-api .
+   ```
+2. **Run the container:**
+   ```bash
+   docker run -p 7860:7860 embedding-api
+   ```
+3. **With Voyage AI (optional):**
+   ```bash
+   docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api
+   ```
+## Hugging Face Spaces Deployment
+### Option 1: Using Hugging Face CLI
+1. **Install Hugging Face CLI:**
+   ```bash
+   pip install huggingface_hub
+   huggingface-cli login
+   ```
+2. **Create a new Space:**
+   - Go to https://huggingface.co/spaces
+   - Click "Create new Space"
+   - Choose "Docker" as the Space SDK
+   - Name your space (e.g., `your-username/embedding-api`)
+3. **Clone and push:**
+   ```bash
+   git clone https://huggingface.co/spaces/your-username/embedding-api
+   cd embedding-api
+   # Copy files from embedding folder
+   cp /path/to/embedding/Dockerfile .
+   cp /path/to/embedding/api.py .
+   cp /path/to/embedding/requirements.txt .
+   cp /path/to/embedding/README.md .
+   git add .
+   git commit -m "Initial commit"
+   git push
+   ```
+4. **Configure environment (optional):**
+   - Go to your Space settings
+   - Add `VOYAGE_API_KEY` secret if using Voyage AI
+### Option 2: Manual Upload
+1. Create a new Docker Space on Hugging Face
+2. Upload these files:
+   - `Dockerfile`
+   - `api.py`
+   - `requirements.txt`
+   - `README.md`
+3. Add environment variables in Settings if needed
+## API Usage
+### Health Check
+```bash
+curl http://localhost:7860/health
+```
+Response:
+```json
+{
+  "status": "healthy",
+  "models_loaded": ["jobbertv2", "jina"],
+  "voyage_available": false
+}
+```
+### Generate Embeddings
+#### JobBERT v2 (Job Titles)
+```bash
+curl -X POST http://localhost:7860/embed \
+  -H "Content-Type: application/json" \
+  -d '{
+    "texts": ["Software Engineer", "Data Scientist", "Product Manager"],
+    "model": "jobbertv2"
+  }'
+```
+#### JobBERT v3 (Latest, Recommended)
+```bash
+curl -X POST http://localhost:7860/embed \
+  -H "Content-Type: application/json" \
+  -d '{
+    "texts": ["Software Engineer", "Data Scientist", "Product Manager"],
+    "model": "jobbertv3"
+  }'
+```
+#### Jina AI (with task specification)
+```bash
+curl -X POST http://localhost:7860/embed \
+  -H "Content-Type: application/json" \
+  -d '{
+    "texts": ["What is machine learning?", "How does AI work?"],
+    "model": "jina",
+    "task": "retrieval.query"
+  }'
+```
+**Jina AI Tasks:**
+- `retrieval.query`: For search queries
+- `retrieval.passage`: For documents
+- `text-matching`: For similarity (default)
+- `classification`: For classification
+- `separation`: For clustering
+#### Voyage AI (requires API key)
+```bash
+curl -X POST http://localhost:7860/embed \
+  -H "Content-Type: application/json" \
+  -d '{
+    "texts": ["This is a document to embed"],
+    "model": "voyage",
+    "input_type": "document"
+  }'
+```
+**Voyage AI Input Types:**
+- `document`: For documents/passages
+- `query`: For search queries
+### Response Format
+```json
+{
+  "embeddings": [
+    [0.123, -0.456, 0.789, ...],
+    [0.234, -0.567, 0.890, ...]
+  ],
+  "model": "jobbertv2",
+  "dimension": 768,
+  "num_texts": 2
+}
+```
+### List Available Models
+```bash
+curl http://localhost:7860/models
+```
+## Python Client Example
+```python
+import requests
+url = "http://localhost:7860/embed"
+# JobBERT v3 (recommended)
+response = requests.post(url, json={
+    "texts": ["Software Engineer", "Data Scientist"],
+    "model": "jobbertv3"
+})
+result = response.json()
+embeddings = result["embeddings"]
+print(f"Got {len(embeddings)} embeddings of dimension {result['dimension']}")
+# JobBERT v2
+response = requests.post(url, json={
+    "texts": ["Product Manager"],
+    "model": "jobbertv2"
+})
+# Jina AI with task
+response = requests.post(url, json={
+    "texts": ["What is Python?"],
+    "model": "jina",
+    "task": "retrieval.query"
+})
+# Voyage AI
+response = requests.post(url, json={
+    "texts": ["Document text here"],
+    "model": "voyage",
+    "input_type": "document"
+})
+```
+## Environment Variables
+- `PORT`: Server port (default: 7860)
+- `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings)
+## Interactive Documentation
+Once the API is running, visit:
+- **Swagger UI**: http://localhost:7860/docs
+- **ReDoc**: http://localhost:7860/redoc
+## Notes
+- Models are downloaded automatically on first startup (~2-3GB total)
+- Voyage AI requires an API key from https://www.voyageai.com/
+- First request to each model may be slower due to model loading
+- Use batch processing for better performance (send multiple texts at once)
+## Troubleshooting
+### Models not loading
+- Check available disk space (need ~3GB)
+- Ensure internet connection for model download
+- Check logs for specific error messages
+### Voyage AI not working
+- Verify `VOYAGE_API_KEY` is set correctly
+- Check API key has sufficient credits
+- Ensure `voyageai` package is installed
+### Out of memory
+- Reduce batch size (process fewer texts per request)
+- Use smaller models (JobBERT v2 instead of Jina)
+- Increase container memory limits
+## License
+This API uses models with different licenses:
+- JobBERT v2/v3: Apache 2.0
+- Jina AI: Apache 2.0
+- Voyage AI: Subject to Voyage AI terms of service

api.py ADDED Viewed

	@@ -0,0 +1,242 @@

+"""
+Embedding Inference API
+Supports JobBERT v2, Jina AI, and Voyage AI embeddings
+"""
+from fastapi import FastAPI, HTTPException
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel, Field
+from typing import List, Optional
+from sentence_transformers import SentenceTransformer
+import os
+import logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+app = FastAPI(
+    title="Embedding Inference API",
+    description="Generate embeddings using JobBERT v2/v3, Jina AI, or Voyage AI",
+    version="1.0.0"
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+MODELS = {}
+VOYAGE_API_KEY = os.environ.get('VOYAGE_API_KEY', '')
+voyage_client = None
+if VOYAGE_API_KEY:
+    try:
+        import voyageai
+        voyage_client = voyageai.Client(api_key=VOYAGE_API_KEY)
+        logger.info("✓ Voyage AI client initialized")
+    except ImportError:
+        logger.warning("⚠️  voyageai package not installed")
+    except Exception as e:
+        logger.warning(f"⚠️  Voyage AI initialization failed: {e}")
+def load_models():
+    """Load embedding models on startup"""
+    try:
+        logger.info("Loading JobBERT-v2...")
+        MODELS['jobbertv2'] = SentenceTransformer('TechWolf/JobBERT-v2')
+        logger.info("✓ JobBERT-v2 loaded")
+        logger.info("Loading JobBERT-v3...")
+        MODELS['jobbertv3'] = SentenceTransformer('TechWolf/JobBERT-v3')
+        logger.info("✓ JobBERT-v3 loaded")
+        logger.info("Loading Jina AI embeddings-v3...")
+        MODELS['jina'] = SentenceTransformer('jinaai/jina-embeddings-v3', trust_remote_code=True)
+        logger.info("✓ Jina AI v3 loaded")
+        logger.info("All models loaded successfully!")
+    except Exception as e:
+        logger.error(f"Error loading models: {e}")
+        raise
+@app.on_event("startup")
+async def startup_event():
+    load_models()
+class EmbeddingRequest(BaseModel):
+    texts: List[str] = Field(..., description="List of texts to embed", min_items=1)
+    model: str = Field(..., description="Model to use: 'jobbertv2', 'jobbertv3', 'jina', or 'voyage'")
+    task: Optional[str] = Field(None, description="Task type for Jina AI: 'retrieval.query', 'retrieval.passage', 'text-matching', etc.")
+    input_type: Optional[str] = Field(None, description="Input type for Voyage AI: 'document' or 'query'")
+    class Config:
+        schema_extra = {
+            "example": {
+                "texts": ["Software Engineer", "Data Scientist"],
+                "model": "jobbertv3",
+                "task": "text-matching"
+            }
+        }
+class EmbeddingResponse(BaseModel):
+    embeddings: List[List[float]] = Field(..., description="List of embedding vectors")
+    model: str = Field(..., description="Model used")
+    dimension: int = Field(..., description="Embedding dimension")
+    num_texts: int = Field(..., description="Number of texts processed")
+class HealthResponse(BaseModel):
+    status: str
+    models_loaded: List[str]
+    voyage_available: bool
+@app.get("/", response_model=dict)
+async def root():
+    """Root endpoint with API information"""
+    return {
+        "message": "Embedding Inference API",
+        "version": "1.0.0",
+        "endpoints": {
+            "/health": "Health check and available models",
+            "/embed": "Generate embeddings (POST)",
+            "/docs": "API documentation"
+        }
+    }
+@app.get("/health", response_model=HealthResponse)
+async def health():
+    """Health check endpoint"""
+    models_loaded = list(MODELS.keys())
+    return {
+        "status": "healthy",
+        "models_loaded": models_loaded,
+        "voyage_available": voyage_client is not None
+    }
+@app.post("/embed", response_model=EmbeddingResponse)
+async def create_embeddings(request: EmbeddingRequest):
+    """
+    Generate embeddings for input texts
+    **Models:**
+    - `jobbertv2`: JobBERT-v2 (768-dim, job-specific)
+    - `jobbertv3`: JobBERT-v3 (768-dim, job-specific, improved performance)
+    - `jina`: Jina AI embeddings-v3 (1024-dim, general purpose, supports task types)
+    - `voyage`: Voyage AI (1024-dim, requires API key)
+    **Jina AI Tasks:**
+    - `retrieval.query`: For search queries
+    - `retrieval.passage`: For documents/passages
+    - `text-matching`: For similarity matching (default)
+    - `classification`: For classification tasks
+    - `separation`: For clustering
+    **Voyage AI Input Types:**
+    - `document`: For documents/passages
+    - `query`: For search queries
+    """
+    model_name = request.model.lower()
+    if model_name == "voyage":
+        if not voyage_client:
+            raise HTTPException(
+                status_code=503,
+                detail="Voyage AI not available. Set VOYAGE_API_KEY environment variable."
+            )
+        try:
+            input_type = request.input_type or "document"
+            result = voyage_client.embed(
+                texts=request.texts,
+                model="voyage-3",
+                input_type=input_type
+            )
+            embeddings = result.embeddings
+            dimension = len(embeddings[0]) if embeddings else 0
+            return EmbeddingResponse(
+                embeddings=embeddings,
+                model="voyage-3",
+                dimension=dimension,
+                num_texts=len(request.texts)
+            )
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f"Voyage AI error: {str(e)}")
+    elif model_name in MODELS:
+        try:
+            model = MODELS[model_name]
+            if model_name == "jina" and request.task:
+                embeddings = model.encode(
+                    request.texts,
+                    task=request.task,
+                    convert_to_numpy=True
+                )
+            else:
+                embeddings = model.encode(
+                    request.texts,
+                    convert_to_numpy=True
+                )
+            embeddings_list = embeddings.tolist()
+            dimension = len(embeddings_list[0]) if embeddings_list else 0
+            return EmbeddingResponse(
+                embeddings=embeddings_list,
+                model=model_name,
+                dimension=dimension,
+                num_texts=len(request.texts)
+            )
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f"Model error: {str(e)}")
+    else:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Invalid model '{model_name}'. Choose from: jobbertv2, jobbertv3, jina, voyage"
+        )
+@app.get("/models")
+async def list_models():
+    """List available models and their specifications"""
+    models_info = {
+        "jobbertv2": {
+            "name": "TechWolf/JobBERT-v2",
+            "dimension": 768,
+            "description": "Job-specific BERT model fine-tuned on job titles",
+            "max_tokens": 512,
+            "available": "jobbertv2" in MODELS
+        },
+        "jobbertv3": {
+            "name": "TechWolf/JobBERT-v3",
+            "dimension": 768,
+            "description": "Latest JobBERT model with improved performance",
+            "max_tokens": 512,
+            "available": "jobbertv3" in MODELS
+        },
+        "jina": {
+            "name": "jinaai/jina-embeddings-v3",
+            "dimension": 1024,
+            "description": "General-purpose embeddings with long context support",
+            "max_tokens": 8192,
+            "available": "jina" in MODELS,
+            "tasks": ["retrieval.query", "retrieval.passage", "text-matching", "classification", "separation"]
+        },
+        "voyage": {
+            "name": "voyage-3",
+            "dimension": 1024,
+            "description": "State-of-the-art embeddings (requires API key)",
+            "max_tokens": 32000,
+            "available": voyage_client is not None,
+            "input_types": ["document", "query"]
+        }
+    }
+    return models_info
+if __name__ == "__main__":
+    import uvicorn
+    port = int(os.environ.get("PORT", 7860))
+    uvicorn.run(app, host="0.0.0.0", port=port)

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+fastapi>=0.104.0
+uvicorn[standard]>=0.24.0
+pydantic>=2.0.0
+sentence-transformers>=3.0.0
+torch>=2.0.0
+transformers>=4.30.0
+numpy<2.0.0
+voyageai>=0.2.0
+einops>=0.6.0