nurulajt commited on
Commit
8da8945
·
verified ·
1 Parent(s): e80adbc

Upload 4 files

Browse files
Files changed (4) hide show
  1. Dockerfile +25 -0
  2. README.md +277 -10
  3. api.py +242 -0
  4. requirements.txt +9 -0
Dockerfile ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ build-essential \
8
+ curl \
9
+ && rm -rf /var/lib/apt/lists/*
10
+
11
+ # Copy requirements and install Python dependencies
12
+ COPY requirements.txt .
13
+ RUN pip install --no-cache-dir -r requirements.txt
14
+
15
+ # Copy application code
16
+ COPY api.py .
17
+
18
+ # Expose port (Hugging Face Spaces uses 7860 by default)
19
+ EXPOSE 7860
20
+
21
+ # Set environment variable for port
22
+ ENV PORT=7860
23
+
24
+ # Run the API server
25
+ CMD ["python", "api.py"]
README.md CHANGED
@@ -1,10 +1,277 @@
1
- ---
2
- title: Inference
3
- emoji: 🚀
4
- colorFrom: blue
5
- colorTo: green
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Embedding Inference API
2
+
3
+ A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI.
4
+
5
+ ## Features
6
+
7
+ - **Multiple Models**: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art)
8
+ - **RESTful API**: Easy-to-use HTTP endpoints
9
+ - **Batch Processing**: Process multiple texts in a single request
10
+ - **Task-Specific Embeddings**: Support for different embedding tasks (retrieval, classification, etc.)
11
+ - **Docker Ready**: Easy deployment to Hugging Face Spaces or any Docker environment
12
+
13
+ ## Supported Models
14
+
15
+ | Model | Dimension | Max Tokens | Best For |
16
+ |-------|-----------|------------|----------|
17
+ | JobBERT v2 | 768 | 512 | Job titles and descriptions |
18
+ | JobBERT v3 | 768 | 512 | Job titles (improved performance) |
19
+ | Jina AI v3 | 1024 | 8,192 | General text, long documents |
20
+ | Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) |
21
+
22
+ ## Quick Start
23
+
24
+ ### Local Development
25
+
26
+ 1. **Install dependencies:**
27
+ ```bash
28
+ cd embedding
29
+ pip install -r requirements.txt
30
+ ```
31
+
32
+ 2. **Run the API:**
33
+ ```bash
34
+ python api.py
35
+ ```
36
+
37
+ 3. **Access the API:**
38
+ - API: http://localhost:7860
39
+ - Docs: http://localhost:7860/docs
40
+
41
+ ### Docker Deployment
42
+
43
+ 1. **Build the image:**
44
+ ```bash
45
+ docker build -t embedding-api .
46
+ ```
47
+
48
+ 2. **Run the container:**
49
+ ```bash
50
+ docker run -p 7860:7860 embedding-api
51
+ ```
52
+
53
+ 3. **With Voyage AI (optional):**
54
+ ```bash
55
+ docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api
56
+ ```
57
+
58
+ ## Hugging Face Spaces Deployment
59
+
60
+ ### Option 1: Using Hugging Face CLI
61
+
62
+ 1. **Install Hugging Face CLI:**
63
+ ```bash
64
+ pip install huggingface_hub
65
+ huggingface-cli login
66
+ ```
67
+
68
+ 2. **Create a new Space:**
69
+ - Go to https://huggingface.co/spaces
70
+ - Click "Create new Space"
71
+ - Choose "Docker" as the Space SDK
72
+ - Name your space (e.g., `your-username/embedding-api`)
73
+
74
+ 3. **Clone and push:**
75
+ ```bash
76
+ git clone https://huggingface.co/spaces/your-username/embedding-api
77
+ cd embedding-api
78
+
79
+ # Copy files from embedding folder
80
+ cp /path/to/embedding/Dockerfile .
81
+ cp /path/to/embedding/api.py .
82
+ cp /path/to/embedding/requirements.txt .
83
+ cp /path/to/embedding/README.md .
84
+
85
+ git add .
86
+ git commit -m "Initial commit"
87
+ git push
88
+ ```
89
+
90
+ 4. **Configure environment (optional):**
91
+ - Go to your Space settings
92
+ - Add `VOYAGE_API_KEY` secret if using Voyage AI
93
+
94
+ ### Option 2: Manual Upload
95
+
96
+ 1. Create a new Docker Space on Hugging Face
97
+ 2. Upload these files:
98
+ - `Dockerfile`
99
+ - `api.py`
100
+ - `requirements.txt`
101
+ - `README.md`
102
+ 3. Add environment variables in Settings if needed
103
+
104
+ ## API Usage
105
+
106
+ ### Health Check
107
+
108
+ ```bash
109
+ curl http://localhost:7860/health
110
+ ```
111
+
112
+ Response:
113
+ ```json
114
+ {
115
+ "status": "healthy",
116
+ "models_loaded": ["jobbertv2", "jina"],
117
+ "voyage_available": false
118
+ }
119
+ ```
120
+
121
+ ### Generate Embeddings
122
+
123
+ #### JobBERT v2 (Job Titles)
124
+
125
+ ```bash
126
+ curl -X POST http://localhost:7860/embed \
127
+ -H "Content-Type: application/json" \
128
+ -d '{
129
+ "texts": ["Software Engineer", "Data Scientist", "Product Manager"],
130
+ "model": "jobbertv2"
131
+ }'
132
+ ```
133
+
134
+ #### JobBERT v3 (Latest, Recommended)
135
+
136
+ ```bash
137
+ curl -X POST http://localhost:7860/embed \
138
+ -H "Content-Type: application/json" \
139
+ -d '{
140
+ "texts": ["Software Engineer", "Data Scientist", "Product Manager"],
141
+ "model": "jobbertv3"
142
+ }'
143
+ ```
144
+
145
+ #### Jina AI (with task specification)
146
+
147
+ ```bash
148
+ curl -X POST http://localhost:7860/embed \
149
+ -H "Content-Type: application/json" \
150
+ -d '{
151
+ "texts": ["What is machine learning?", "How does AI work?"],
152
+ "model": "jina",
153
+ "task": "retrieval.query"
154
+ }'
155
+ ```
156
+
157
+ **Jina AI Tasks:**
158
+ - `retrieval.query`: For search queries
159
+ - `retrieval.passage`: For documents
160
+ - `text-matching`: For similarity (default)
161
+ - `classification`: For classification
162
+ - `separation`: For clustering
163
+
164
+ #### Voyage AI (requires API key)
165
+
166
+ ```bash
167
+ curl -X POST http://localhost:7860/embed \
168
+ -H "Content-Type: application/json" \
169
+ -d '{
170
+ "texts": ["This is a document to embed"],
171
+ "model": "voyage",
172
+ "input_type": "document"
173
+ }'
174
+ ```
175
+
176
+ **Voyage AI Input Types:**
177
+ - `document`: For documents/passages
178
+ - `query`: For search queries
179
+
180
+ ### Response Format
181
+
182
+ ```json
183
+ {
184
+ "embeddings": [
185
+ [0.123, -0.456, 0.789, ...],
186
+ [0.234, -0.567, 0.890, ...]
187
+ ],
188
+ "model": "jobbertv2",
189
+ "dimension": 768,
190
+ "num_texts": 2
191
+ }
192
+ ```
193
+
194
+ ### List Available Models
195
+
196
+ ```bash
197
+ curl http://localhost:7860/models
198
+ ```
199
+
200
+ ## Python Client Example
201
+
202
+ ```python
203
+ import requests
204
+
205
+ url = "http://localhost:7860/embed"
206
+
207
+ # JobBERT v3 (recommended)
208
+ response = requests.post(url, json={
209
+ "texts": ["Software Engineer", "Data Scientist"],
210
+ "model": "jobbertv3"
211
+ })
212
+ result = response.json()
213
+ embeddings = result["embeddings"]
214
+ print(f"Got {len(embeddings)} embeddings of dimension {result['dimension']}")
215
+
216
+ # JobBERT v2
217
+ response = requests.post(url, json={
218
+ "texts": ["Product Manager"],
219
+ "model": "jobbertv2"
220
+ })
221
+
222
+ # Jina AI with task
223
+ response = requests.post(url, json={
224
+ "texts": ["What is Python?"],
225
+ "model": "jina",
226
+ "task": "retrieval.query"
227
+ })
228
+
229
+ # Voyage AI
230
+ response = requests.post(url, json={
231
+ "texts": ["Document text here"],
232
+ "model": "voyage",
233
+ "input_type": "document"
234
+ })
235
+ ```
236
+
237
+ ## Environment Variables
238
+
239
+ - `PORT`: Server port (default: 7860)
240
+ - `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings)
241
+
242
+ ## Interactive Documentation
243
+
244
+ Once the API is running, visit:
245
+ - **Swagger UI**: http://localhost:7860/docs
246
+ - **ReDoc**: http://localhost:7860/redoc
247
+
248
+ ## Notes
249
+
250
+ - Models are downloaded automatically on first startup (~2-3GB total)
251
+ - Voyage AI requires an API key from https://www.voyageai.com/
252
+ - First request to each model may be slower due to model loading
253
+ - Use batch processing for better performance (send multiple texts at once)
254
+
255
+ ## Troubleshooting
256
+
257
+ ### Models not loading
258
+ - Check available disk space (need ~3GB)
259
+ - Ensure internet connection for model download
260
+ - Check logs for specific error messages
261
+
262
+ ### Voyage AI not working
263
+ - Verify `VOYAGE_API_KEY` is set correctly
264
+ - Check API key has sufficient credits
265
+ - Ensure `voyageai` package is installed
266
+
267
+ ### Out of memory
268
+ - Reduce batch size (process fewer texts per request)
269
+ - Use smaller models (JobBERT v2 instead of Jina)
270
+ - Increase container memory limits
271
+
272
+ ## License
273
+
274
+ This API uses models with different licenses:
275
+ - JobBERT v2/v3: Apache 2.0
276
+ - Jina AI: Apache 2.0
277
+ - Voyage AI: Subject to Voyage AI terms of service
api.py ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Embedding Inference API
3
+ Supports JobBERT v2, Jina AI, and Voyage AI embeddings
4
+ """
5
+
6
+ from fastapi import FastAPI, HTTPException
7
+ from fastapi.middleware.cors import CORSMiddleware
8
+ from pydantic import BaseModel, Field
9
+ from typing import List, Optional
10
+ from sentence_transformers import SentenceTransformer
11
+ import os
12
+ import logging
13
+
14
+ logging.basicConfig(level=logging.INFO)
15
+ logger = logging.getLogger(__name__)
16
+
17
+ app = FastAPI(
18
+ title="Embedding Inference API",
19
+ description="Generate embeddings using JobBERT v2/v3, Jina AI, or Voyage AI",
20
+ version="1.0.0"
21
+ )
22
+
23
+ app.add_middleware(
24
+ CORSMiddleware,
25
+ allow_origins=["*"],
26
+ allow_credentials=True,
27
+ allow_methods=["*"],
28
+ allow_headers=["*"],
29
+ )
30
+
31
+ MODELS = {}
32
+ VOYAGE_API_KEY = os.environ.get('VOYAGE_API_KEY', '')
33
+ voyage_client = None
34
+
35
+ if VOYAGE_API_KEY:
36
+ try:
37
+ import voyageai
38
+ voyage_client = voyageai.Client(api_key=VOYAGE_API_KEY)
39
+ logger.info("✓ Voyage AI client initialized")
40
+ except ImportError:
41
+ logger.warning("⚠️ voyageai package not installed")
42
+ except Exception as e:
43
+ logger.warning(f"⚠️ Voyage AI initialization failed: {e}")
44
+
45
+ def load_models():
46
+ """Load embedding models on startup"""
47
+ try:
48
+ logger.info("Loading JobBERT-v2...")
49
+ MODELS['jobbertv2'] = SentenceTransformer('TechWolf/JobBERT-v2')
50
+ logger.info("✓ JobBERT-v2 loaded")
51
+
52
+ logger.info("Loading JobBERT-v3...")
53
+ MODELS['jobbertv3'] = SentenceTransformer('TechWolf/JobBERT-v3')
54
+ logger.info("✓ JobBERT-v3 loaded")
55
+
56
+ logger.info("Loading Jina AI embeddings-v3...")
57
+ MODELS['jina'] = SentenceTransformer('jinaai/jina-embeddings-v3', trust_remote_code=True)
58
+ logger.info("✓ Jina AI v3 loaded")
59
+
60
+ logger.info("All models loaded successfully!")
61
+ except Exception as e:
62
+ logger.error(f"Error loading models: {e}")
63
+ raise
64
+
65
+ @app.on_event("startup")
66
+ async def startup_event():
67
+ load_models()
68
+
69
+ class EmbeddingRequest(BaseModel):
70
+ texts: List[str] = Field(..., description="List of texts to embed", min_items=1)
71
+ model: str = Field(..., description="Model to use: 'jobbertv2', 'jobbertv3', 'jina', or 'voyage'")
72
+ task: Optional[str] = Field(None, description="Task type for Jina AI: 'retrieval.query', 'retrieval.passage', 'text-matching', etc.")
73
+ input_type: Optional[str] = Field(None, description="Input type for Voyage AI: 'document' or 'query'")
74
+
75
+ class Config:
76
+ schema_extra = {
77
+ "example": {
78
+ "texts": ["Software Engineer", "Data Scientist"],
79
+ "model": "jobbertv3",
80
+ "task": "text-matching"
81
+ }
82
+ }
83
+
84
+ class EmbeddingResponse(BaseModel):
85
+ embeddings: List[List[float]] = Field(..., description="List of embedding vectors")
86
+ model: str = Field(..., description="Model used")
87
+ dimension: int = Field(..., description="Embedding dimension")
88
+ num_texts: int = Field(..., description="Number of texts processed")
89
+
90
+ class HealthResponse(BaseModel):
91
+ status: str
92
+ models_loaded: List[str]
93
+ voyage_available: bool
94
+
95
+ @app.get("/", response_model=dict)
96
+ async def root():
97
+ """Root endpoint with API information"""
98
+ return {
99
+ "message": "Embedding Inference API",
100
+ "version": "1.0.0",
101
+ "endpoints": {
102
+ "/health": "Health check and available models",
103
+ "/embed": "Generate embeddings (POST)",
104
+ "/docs": "API documentation"
105
+ }
106
+ }
107
+
108
+ @app.get("/health", response_model=HealthResponse)
109
+ async def health():
110
+ """Health check endpoint"""
111
+ models_loaded = list(MODELS.keys())
112
+ return {
113
+ "status": "healthy",
114
+ "models_loaded": models_loaded,
115
+ "voyage_available": voyage_client is not None
116
+ }
117
+
118
+ @app.post("/embed", response_model=EmbeddingResponse)
119
+ async def create_embeddings(request: EmbeddingRequest):
120
+ """
121
+ Generate embeddings for input texts
122
+
123
+ **Models:**
124
+ - `jobbertv2`: JobBERT-v2 (768-dim, job-specific)
125
+ - `jobbertv3`: JobBERT-v3 (768-dim, job-specific, improved performance)
126
+ - `jina`: Jina AI embeddings-v3 (1024-dim, general purpose, supports task types)
127
+ - `voyage`: Voyage AI (1024-dim, requires API key)
128
+
129
+ **Jina AI Tasks:**
130
+ - `retrieval.query`: For search queries
131
+ - `retrieval.passage`: For documents/passages
132
+ - `text-matching`: For similarity matching (default)
133
+ - `classification`: For classification tasks
134
+ - `separation`: For clustering
135
+
136
+ **Voyage AI Input Types:**
137
+ - `document`: For documents/passages
138
+ - `query`: For search queries
139
+ """
140
+ model_name = request.model.lower()
141
+
142
+ if model_name == "voyage":
143
+ if not voyage_client:
144
+ raise HTTPException(
145
+ status_code=503,
146
+ detail="Voyage AI not available. Set VOYAGE_API_KEY environment variable."
147
+ )
148
+
149
+ try:
150
+ input_type = request.input_type or "document"
151
+ result = voyage_client.embed(
152
+ texts=request.texts,
153
+ model="voyage-3",
154
+ input_type=input_type
155
+ )
156
+ embeddings = result.embeddings
157
+ dimension = len(embeddings[0]) if embeddings else 0
158
+
159
+ return EmbeddingResponse(
160
+ embeddings=embeddings,
161
+ model="voyage-3",
162
+ dimension=dimension,
163
+ num_texts=len(request.texts)
164
+ )
165
+ except Exception as e:
166
+ raise HTTPException(status_code=500, detail=f"Voyage AI error: {str(e)}")
167
+
168
+ elif model_name in MODELS:
169
+ try:
170
+ model = MODELS[model_name]
171
+
172
+ if model_name == "jina" and request.task:
173
+ embeddings = model.encode(
174
+ request.texts,
175
+ task=request.task,
176
+ convert_to_numpy=True
177
+ )
178
+ else:
179
+ embeddings = model.encode(
180
+ request.texts,
181
+ convert_to_numpy=True
182
+ )
183
+
184
+ embeddings_list = embeddings.tolist()
185
+ dimension = len(embeddings_list[0]) if embeddings_list else 0
186
+
187
+ return EmbeddingResponse(
188
+ embeddings=embeddings_list,
189
+ model=model_name,
190
+ dimension=dimension,
191
+ num_texts=len(request.texts)
192
+ )
193
+ except Exception as e:
194
+ raise HTTPException(status_code=500, detail=f"Model error: {str(e)}")
195
+
196
+ else:
197
+ raise HTTPException(
198
+ status_code=400,
199
+ detail=f"Invalid model '{model_name}'. Choose from: jobbertv2, jobbertv3, jina, voyage"
200
+ )
201
+
202
+ @app.get("/models")
203
+ async def list_models():
204
+ """List available models and their specifications"""
205
+ models_info = {
206
+ "jobbertv2": {
207
+ "name": "TechWolf/JobBERT-v2",
208
+ "dimension": 768,
209
+ "description": "Job-specific BERT model fine-tuned on job titles",
210
+ "max_tokens": 512,
211
+ "available": "jobbertv2" in MODELS
212
+ },
213
+ "jobbertv3": {
214
+ "name": "TechWolf/JobBERT-v3",
215
+ "dimension": 768,
216
+ "description": "Latest JobBERT model with improved performance",
217
+ "max_tokens": 512,
218
+ "available": "jobbertv3" in MODELS
219
+ },
220
+ "jina": {
221
+ "name": "jinaai/jina-embeddings-v3",
222
+ "dimension": 1024,
223
+ "description": "General-purpose embeddings with long context support",
224
+ "max_tokens": 8192,
225
+ "available": "jina" in MODELS,
226
+ "tasks": ["retrieval.query", "retrieval.passage", "text-matching", "classification", "separation"]
227
+ },
228
+ "voyage": {
229
+ "name": "voyage-3",
230
+ "dimension": 1024,
231
+ "description": "State-of-the-art embeddings (requires API key)",
232
+ "max_tokens": 32000,
233
+ "available": voyage_client is not None,
234
+ "input_types": ["document", "query"]
235
+ }
236
+ }
237
+ return models_info
238
+
239
+ if __name__ == "__main__":
240
+ import uvicorn
241
+ port = int(os.environ.get("PORT", 7860))
242
+ uvicorn.run(app, host="0.0.0.0", port=port)
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ fastapi>=0.104.0
2
+ uvicorn[standard]>=0.24.0
3
+ pydantic>=2.0.0
4
+ sentence-transformers>=3.0.0
5
+ torch>=2.0.0
6
+ transformers>=4.30.0
7
+ numpy<2.0.0
8
+ voyageai>=0.2.0
9
+ einops>=0.6.0