Spaces:

ArdaKaratas
/

arya

Sleeping

File size: 4,106 Bytes

39ea998

---
title: GAIA Agent
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false
---

# 🤖 GAIA Agent

A sophisticated AI agent designed to solve GAIA (General AI Assistants) benchmark questions using multiple tools and capabilities.

## Overview

This agent is built to tackle the GAIA benchmark, which tests AI systems on real-world tasks requiring reasoning, multi-modal understanding, web browsing, and tool usage. The agent combines multiple tools to provide accurate answers to complex questions.

## Features

The GAIA Agent has access to the following tools:

- **Web Search** (DuckDuckGo): Search the web for latest information
- **Code Interpreter**: Execute Python code for calculations and data processing
- **Image Processing**: Analyze images from URLs
- **Weather Information**: Get weather data for any location
- **Hub Statistics**: Fetch model statistics from Hugging Face Hub

## Architecture

- **Framework**: smolagents
- **Model**: OpenRouter API (meta-llama/llama-3.3-70b-instruct:free)
- **Planning**: Enabled with interval of 3 steps
- **Base Tools**: Additional base tools enabled

## Project Structure

```
agent_hugging/
├── agent.py              # Main agent implementation
├── app.py                # Gradio interface for interaction
├── code_interpreter.py   # Python code execution tool
├── image_processing.py   # Image analysis tool
├── tools.py              # Custom tools (search, weather, hub stats)
├── system_prompt.txt     # System prompt for the agent
├── requirements.txt      # Python dependencies
└── README.md             # This file
```

## Setup

1. **Install dependencies:**
```bash
pip install -r requirements.txt
```

2. **Set environment variables:**
```bash
export OPENROUTER_API_KEY="your-api-key-here"
export HF_TOKEN="your-huggingface-token"  # Optional, for Hugging Face Hub operations
export HF_USERNAME="ArdaKaratas"  # Optional, defaults to ArdaKaratas
export HF_SPACE_NAME="agent_hugging"  # Optional, defaults to agent_hugging
```

Get a free API key from: https://openrouter.ai/keys
Get your Hugging Face token from: https://huggingface.co/settings/tokens

3. **Run the agent:**
```bash
python agent.py
```

4. **Launch the Gradio interface:**
```bash
python app.py
```

## Usage

### Testing a Single Question

Use the "Test Single Question" tab in the Gradio interface to:
- Enter a question manually
- Fetch a random question from the benchmark
- Get the agent's answer

### Submitting All Answers

Use the "Submit All Answers" tab to:
1. Enter your Hugging Face username
2. Optionally provide your Space code link
3. Click "Process & Submit All Questions"
4. View the submission status and results

### Viewing Questions

Use the "View All Questions" tab to browse all GAIA benchmark questions.

## API Integration

The app connects to the scoring API at: `https://agents-course-unit4-scoring.hf.space`

Endpoints:
- `GET /questions`: Retrieve all questions
- `GET /random-question`: Get a random question
- `POST /submit`: Submit answers for scoring

## Metadata.jsonl Support

The project includes `metadata.jsonl` which contains GAIA benchmark questions and their correct answers. This file is used for:

1. **Testing & Validation**: Compare agent answers with correct answers from metadata
2. **Debugging**: See expected answers when testing the agent
3. **Development**: Understand question patterns and expected answer formats

**Note**: In production, the agent generates its own answers. The metadata is only used for comparison and validation purposes.

## Notes

- The agent returns answers directly without "FINAL ANSWER" prefix
- Answers are compared using exact match
- Make sure your Space is public for verification
- The code interpreter has security restrictions to prevent dangerous operations
- Use the "Compare with metadata.jsonl" checkbox in the test interface to see how your agent's answers compare to the correct answers

## License

This project is part of the Hugging Face AI Agents Course - Unit 4 Final Assignment.