agentica-org
/

DeepSWE-Verifier

Generated from Trainer

Model card Files Files and versions

michaelzhiluo commited on Jul 2

Commit

eb8b7fc

·

verified ·

1 Parent(s): 605585b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -48,7 +48,7 @@ DeepSWE-Verifier is a fine-tuned/SFT version of [Qwen/Qwen3-14B](https://hugging
 Discover more about DeepSWE-Preview's development and capabilities in our [technical blog post](www.google.com).
 <div style="margin: 0 auto;">
-  <img src="https://cdn-lfs-us-1.hf.co/repos/fe/8c/fe8cf2197ba6bcf2ded6d3e131c2688d33f84166d98d6faf3da79cf572a06253/f6503dea8049dd709774c1f6cd1837867f5756ec82c24306a91834e30f66e767?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27bestk_plot_agent.png%3B+filename%3D%22bestk_plot_agent.png%22%3B&response-content-type=image%2Fpng&Expires=1751412784&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc1MTQxMjc4NH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zL2ZlLzhjL2ZlOGNmMjE5N2JhNmJjZjJkZWQ2ZDNlMTMxYzI2ODhkMzNmODQxNjZkOThkNmZhZjNkYTc5Y2Y1NzJhMDYyNTMvZjY1MDNkZWE4MDQ5ZGQ3MDk3NzRjMWY2Y2QxODM3ODY3ZjU3NTZlYzgyYzI0MzA2YTkxODM0ZTMwZjY2ZTc2Nz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=HqZtKWygmax5Doo9Sdj09-PeBonl1P5%7ErphhEy01Ry9FPtLN3kKublSGf7uufQzoLdMT9yQJep4MI9WcxTliyCJ2MZqyKC4jfoRaLuRxOvD4TB54TtsZ6ATknvLmXtzcg0uEiZc%7E75IP2aTNk4RVfK211N5pj6u1rZTF45vC9c7xojldEXDLHKoW9zKQx695ULxTtYOHgq3BPexZ4LcOP0AUyTIDOxyEPFeV0jRUNkrGlb7qi3Xbcav6I5jd9HEgJPwioqK2s4JR4HktQS7oOLIrgFuNtjktOU8ReHzb92o7M7SqMWhn37wDU9gMgYui60uArDuTdmkcXCxZolwZVA__&Key-Pair-Id=K24J24Z295AEI9" style="width: 100%;" />
   <p align="center" style="margin-top: 8px; font-style: italic; color: #666;">
     Figure 1: SWE-Bench Verified Performance w.r.t. different TTS strategies. With hybrid TTS, DeepSWE-Preview achieves 59%, beating the current SOTA open-weights model (SkyWork + TTS, 47%) by 12%. We note that only using execution-based and execution-free verifiers is still effective and can bring 10+% performance.
   </p>

 Discover more about DeepSWE-Preview's development and capabilities in our [technical blog post](www.google.com).
 <div style="margin: 0 auto;">
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/654037be97949fd2304aab7f/a7urAV3isk73ZkIbu3d7s.png" style="width: 100%;" />
   <p align="center" style="margin-top: 8px; font-style: italic; color: #666;">
     Figure 1: SWE-Bench Verified Performance w.r.t. different TTS strategies. With hybrid TTS, DeepSWE-Preview achieves 59%, beating the current SOTA open-weights model (SkyWork + TTS, 47%) by 12%. We note that only using execution-based and execution-free verifiers is still effective and can bring 10+% performance.
   </p>