When I want to query the topic of weather, but the most relative is "Gaming"

by CrocodileGreen - opened Jan 8

Jan 8

$ python3 -c "
import requests
import json
queries = [
'Search for weather information',
'get weather forecast',
'weather report',
'temperature today'
]
weather_text = 'Current weather conditions and forecasts'
gaming_text = 'Get List of all Lost Ark Cards details'
for query in queries:
payload = {'model': 'KaLM', 'input': query}
resp = requests.post('http://localhost:8001/v1/embeddings', json=payload)
query_emb = resp.json()['data'][0]['embedding']

for i, text in enumerate([weather_text, gaming_text]):
    payload2 = {'model': 'KaLM', 'input': text}
    resp2 = requests.post('http://localhost:8001/v1/embeddings', json=payload2)
    emb = resp2.json()['data'][0]['embedding']
    sim = sum([q*e for q,e in zip(query_emb, emb)]) / (sum([q**2 for q in query_emb])**0.5 * sum([e**2 for e in emb])**0.5)
    print(f'{query} -> {\"Weather\" if i==0 else \"Gaming\"}: {sim:.4f}')
print()

"
Search for weather information -> Weather: 0.7216
Search for weather information -> Gaming: 0.9537
get weather forecast -> Weather: 0.7007
get weather forecast -> Gaming: 0.9006
weather report -> Weather: 0.6919
weather report -> Gaming: 0.8948
temperature today -> Weather: 0.6992
temperature today -> Gaming: 0.8919

YanshekWoo

Tencent org Jan 8

I tested it on my own deployed service using the data you provided. The results appear to be consistent with expectations:

Search for weather information -> Weather: 0.8248
get weather forecast -> Gaming: 0.5281
Search for weather information -> Weather: 0.7134
get weather forecast -> Gaming: 0.4050
Search for weather information -> Weather: 0.7887
get weather forecast -> Gaming: 0.4070
Search for weather information -> Weather: 0.7154
get weather forecast -> Gaming: 0.3772

The code I ran is as follows (where my_get_embedding_func is the function for calling my own vLLM deployed service):

queries = [
'Search for weather information',
'get weather forecast',
'weather report',
'temperature today'
]

weather_text = 'Current weather conditions and forecasts'
gaming_text = 'Get List of all Lost Ark Cards details'

for query in queries:
    query_emb = my_get_embedding_func(query)

    for i, text in enumerate([weather_text, gaming_text]):
        emb = my_get_embedding_func(text)
        sim = sum([q*e for q,e in zip(query_emb, emb)]) / (sum([q**2 for q in query_emb])**0.5 * sum([e**2 for e in emb])**0.5)
        print(f"""{queries[i]} -> {"Weather" if i==0 else "Gaming"}: {sim:.4f}""")

print()

Could you please provide details on your model deployment? It would be helpful to verify if the parameters are being loaded correctly.

CrocodileGreen

Jan 8

the response you get seems to be expected, but mine is bad. And When I using the same vllm config and code to test on Qwen3-Embedding-8B, nothing out of expected happens. So I don't know why. The detailed are following:

=======KaLM-Embedding-Gemma3-12B-2511========
vllm serve the/path/to/KaLM-Embedding-Gemma3-12B-2511
--served-model-name "KaLM"
--trust-remote-code
--dtype auto
--tensor-parallel-size 1
--max-model-len 32768
--gpu-memory-utilization 0.95
--port 8001

Gaming: 0.8919

===============Qwen3-Embedding===============
python3 -c "
import requests
import json
queries = [
'Search for weather information',
'get weather forecast',
'weather report',
'temperature today'
]
weather_text = 'Current weather conditions and forecasts'
gaming_text = 'Get List of all Lost Ark Cards details'
for query in queries:
payload = {'model': 'Qwen3-Embedding', 'input': query}
resp = requests.post('http://localhost:8001/v1/embeddings', json=payload)
query_emb = resp.json()['data'][0]['embedding']

for i, text in enumerate([weather_text, gaming_text]):
    payload2 = {'model': 'Qwen3-Embedding', 'input': text}
    resp2 = requests.post('http://localhost:8001/v1/embeddings', json=payload2)
    emb = resp2.json()['data'][0]['embedding']
    sim = sum([q*e for q,e in zip(query_emb, emb)]) / (sum([q**2 for q in query_emb])**0.5 * sum([e**2 for e in emb])**0.5)
    print(f'{query} -> {\"Weather\" if i==0 else \"Gaming\"}: {sim:.4f}')
print()

"
Search for weather information -> Weather: 0.8688
Search for weather information -> Gaming: 0.4566

get weather forecast -> Weather: 0.8422
get weather forecast -> Gaming: 0.4513

weather report -> Weather: 0.8197
weather report -> Gaming: 0.4280

temperature today -> Weather: 0.7379
temperature today -> Gaming: 0.4008

CrocodileGreen

Jan 8

I miss the vllm cli for qwen
vllm serve the/path/to/Qwen3-Embedding-8B
--served-model-name "Qwen3-Embedding"
--trust-remote-code
--dtype auto
--tensor-parallel-size 1
--max-model-len 32768
--gpu-memory-utilization 0.95
--port 8001

YanshekWoo

Tencent org Jan 8

Oh, it looks like you are using vLLM.

Just to confirm, did you notice the explanation here: https://huggingface.co/tencent/KaLM-Embedding-Gemma3-12B-2511#vllm-support?
When loading with vLLM, you need to use the parameters from the CausalLM branch.

CrocodileGreen

Jan 8

I notice the explanation early , but maybe i need to figure out whether i use the branch,Thank you for your inspiring work and the patience to response me.

YanshekWoo

Tencent org Jan 9

Hi there, just checking back in—has the issue here been identified and fixed yet?

CrocodileGreen changed discussion status to closed Jan 12

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

When I want to query the topic of weather, but the most relative is "Gaming"

temperature today -> Weather: 0.7379temperature today -> Gaming: 0.4008

temperature today -> Weather: 0.7379
temperature today -> Gaming: 0.4008