arya / README.md
ArdaKaratas's picture
Update README.md
39ea998 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: GAIA Agent
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false

πŸ€– GAIA Agent

A sophisticated AI agent designed to solve GAIA (General AI Assistants) benchmark questions using multiple tools and capabilities.

Overview

This agent is built to tackle the GAIA benchmark, which tests AI systems on real-world tasks requiring reasoning, multi-modal understanding, web browsing, and tool usage. The agent combines multiple tools to provide accurate answers to complex questions.

Features

The GAIA Agent has access to the following tools:

  • Web Search (DuckDuckGo): Search the web for latest information
  • Code Interpreter: Execute Python code for calculations and data processing
  • Image Processing: Analyze images from URLs
  • Weather Information: Get weather data for any location
  • Hub Statistics: Fetch model statistics from Hugging Face Hub

Architecture

  • Framework: smolagents
  • Model: OpenRouter API (meta-llama/llama-3.3-70b-instruct:free)
  • Planning: Enabled with interval of 3 steps
  • Base Tools: Additional base tools enabled

Project Structure

agent_hugging/
β”œβ”€β”€ agent.py              # Main agent implementation
β”œβ”€β”€ app.py                # Gradio interface for interaction
β”œβ”€β”€ code_interpreter.py   # Python code execution tool
β”œβ”€β”€ image_processing.py   # Image analysis tool
β”œβ”€β”€ tools.py              # Custom tools (search, weather, hub stats)
β”œβ”€β”€ system_prompt.txt     # System prompt for the agent
β”œβ”€β”€ requirements.txt      # Python dependencies
└── README.md             # This file

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Set environment variables:
export OPENROUTER_API_KEY="your-api-key-here"
export HF_TOKEN="your-huggingface-token"  # Optional, for Hugging Face Hub operations
export HF_USERNAME="ArdaKaratas"  # Optional, defaults to ArdaKaratas
export HF_SPACE_NAME="agent_hugging"  # Optional, defaults to agent_hugging

Get a free API key from: https://openrouter.ai/keys Get your Hugging Face token from: https://huggingface.co/settings/tokens

  1. Run the agent:
python agent.py
  1. Launch the Gradio interface:
python app.py

Usage

Testing a Single Question

Use the "Test Single Question" tab in the Gradio interface to:

  • Enter a question manually
  • Fetch a random question from the benchmark
  • Get the agent's answer

Submitting All Answers

Use the "Submit All Answers" tab to:

  1. Enter your Hugging Face username
  2. Optionally provide your Space code link
  3. Click "Process & Submit All Questions"
  4. View the submission status and results

Viewing Questions

Use the "View All Questions" tab to browse all GAIA benchmark questions.

API Integration

The app connects to the scoring API at: https://agents-course-unit4-scoring.hf.space

Endpoints:

  • GET /questions: Retrieve all questions
  • GET /random-question: Get a random question
  • POST /submit: Submit answers for scoring

Metadata.jsonl Support

The project includes metadata.jsonl which contains GAIA benchmark questions and their correct answers. This file is used for:

  1. Testing & Validation: Compare agent answers with correct answers from metadata
  2. Debugging: See expected answers when testing the agent
  3. Development: Understand question patterns and expected answer formats

Note: In production, the agent generates its own answers. The metadata is only used for comparison and validation purposes.

Notes

  • The agent returns answers directly without "FINAL ANSWER" prefix
  • Answers are compared using exact match
  • Make sure your Space is public for verification
  • The code interpreter has security restrictions to prevent dangerous operations
  • Use the "Compare with metadata.jsonl" checkbox in the test interface to see how your agent's answers compare to the correct answers

License

This project is part of the Hugging Face AI Agents Course - Unit 4 Final Assignment.