Commit History

adding the search page back #3
67e646a
Running

wbrooks commited on

removed debugging messages now that search is working
b6127b6

wbrooks commited on

making sure all the correct arguments are there in the function calls
bccb1fa

wbrooks commited on

pass IDF DTM to the function factory
b23d55c

wbrooks commited on

removed some cruft from app.py
2fe266e

wbrooks commited on

the factory has to actually return the function
bd1c23b

wbrooks commited on

corrected the name of the fasttext model binary
edfee12

wbrooks commited on

send row names to be included in the results
76dd5e2

wbrooks commited on

build the search functions on app launch, rather than per-query
5ca7119

wbrooks commited on

add the data files necessary for search to version control
01fe4b2

wbrooks commited on

switched to the new search functions using sentence-transformers
68fd999

wbrooks commited on

add sentence_transformers requirement
1b85d5f

wbrooks commited on

added search URL to the README
1aa0d22

wbrooks commited on

renamed modules that do the search
c1d8ce6

wbrooks commited on

changes TF-IDF search to cosine similarity from dot product
ab4ff40

wbrooks commited on

changes TF-IDF search to cosine similarity from dot product
1310186

wbrooks commited on

add the rank-combined column before using it
861d14f

wbrooks commited on

being more selective with the columns that print in the output
9fbd1cf

wbrooks commited on

need to used vectorized strip_prefix to modify a polars column
234c1f5

wbrooks commited on

shortening the name of files
547533f

wbrooks commited on

just return the dist of results directly
7e2a479

wbrooks commited on

trying to get a response
928dc40

wbrooks commited on

testing a simpler response
106e459

wbrooks commited on

return results as JSON
6f54f14

wbrooks commited on

render result to a jinja table
21d4134

wbrooks commited on

apparently need pyarrow for pandas to_html and the huggingface environment-builder is too stupid to install required packages
931423e

wbrooks commited on

working out how to format the result
9916b48

wbrooks commited on

add pandas to requirements
65a3f08

wbrooks commited on

trying to figure out why the last print failed
21b7815

wbrooks commited on

switch tfidf search to use file list saved by joblib
3facea3

wbrooks commited on

reshape with shape, not size
1cf6271

wbrooks commited on

reduce files list to 1-d
7b687d4

wbrooks commited on

convert output to string
881f70b

wbrooks commited on

testing separate query methods
c70ddc5

wbrooks commited on

trying to test app.py
35d7ab6

wbrooks commited on

trying to test app.py
a5caccb

wbrooks commited on

fixed a typo and added a test endpoint
f6d14bf

wbrooks commited on

use block_embeddings_df from the compressed serialized parquet file
68239c7

wbrooks commited on

added transformers to requirements
2382b9c

wbrooks commited on

copied encode function directly into search_embeddings.py
6b6def4

wbrooks commited on

need to specify path to encode because this is a hacky prototype
88bbcb9

wbrooks commited on

use a pre-serialized dtm_svd
d503cc1

wbrooks commited on

allow pickle for deserializing data
4e5f6d2

wbrooks commited on

put search on an API endpoint
6863a59

wbrooks commited on

adding very basic app
e2ee208

wbrooks commited on

load the DTM and file list from serialized versions
21ca93f

wbrooks commited on

don't need to install glob
17f8024

wbrooks commited on

added scripts for testing inference
c795cd4

wbrooks commited on

add huggingface_hub to requirements
fe9eb34

wbrooks commited on

added glob to requirements
16a8955

wbrooks commited on