File size: 4,106 Bytes
39ea998
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: GAIA Agent
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.0.2
app_file: app.py
pinned: false
---

# πŸ€– GAIA Agent

A sophisticated AI agent designed to solve GAIA (General AI Assistants) benchmark questions using multiple tools and capabilities.

## Overview

This agent is built to tackle the GAIA benchmark, which tests AI systems on real-world tasks requiring reasoning, multi-modal understanding, web browsing, and tool usage. The agent combines multiple tools to provide accurate answers to complex questions.

## Features

The GAIA Agent has access to the following tools:

- **Web Search** (DuckDuckGo): Search the web for latest information
- **Code Interpreter**: Execute Python code for calculations and data processing
- **Image Processing**: Analyze images from URLs
- **Weather Information**: Get weather data for any location
- **Hub Statistics**: Fetch model statistics from Hugging Face Hub

## Architecture

- **Framework**: smolagents
- **Model**: OpenRouter API (meta-llama/llama-3.3-70b-instruct:free)
- **Planning**: Enabled with interval of 3 steps
- **Base Tools**: Additional base tools enabled

## Project Structure

```
agent_hugging/
β”œβ”€β”€ agent.py              # Main agent implementation
β”œβ”€β”€ app.py                # Gradio interface for interaction
β”œβ”€β”€ code_interpreter.py   # Python code execution tool
β”œβ”€β”€ image_processing.py   # Image analysis tool
β”œβ”€β”€ tools.py              # Custom tools (search, weather, hub stats)
β”œβ”€β”€ system_prompt.txt     # System prompt for the agent
β”œβ”€β”€ requirements.txt      # Python dependencies
└── README.md             # This file
```

## Setup

1. **Install dependencies:**
```bash
pip install -r requirements.txt
```

2. **Set environment variables:**
```bash
export OPENROUTER_API_KEY="your-api-key-here"
export HF_TOKEN="your-huggingface-token"  # Optional, for Hugging Face Hub operations
export HF_USERNAME="ArdaKaratas"  # Optional, defaults to ArdaKaratas
export HF_SPACE_NAME="agent_hugging"  # Optional, defaults to agent_hugging
```

Get a free API key from: https://openrouter.ai/keys
Get your Hugging Face token from: https://huggingface.co/settings/tokens

3. **Run the agent:**
```bash
python agent.py
```

4. **Launch the Gradio interface:**
```bash
python app.py
```

## Usage

### Testing a Single Question

Use the "Test Single Question" tab in the Gradio interface to:
- Enter a question manually
- Fetch a random question from the benchmark
- Get the agent's answer

### Submitting All Answers

Use the "Submit All Answers" tab to:
1. Enter your Hugging Face username
2. Optionally provide your Space code link
3. Click "Process & Submit All Questions"
4. View the submission status and results

### Viewing Questions

Use the "View All Questions" tab to browse all GAIA benchmark questions.

## API Integration

The app connects to the scoring API at: `https://agents-course-unit4-scoring.hf.space`

Endpoints:
- `GET /questions`: Retrieve all questions
- `GET /random-question`: Get a random question
- `POST /submit`: Submit answers for scoring

## Metadata.jsonl Support

The project includes `metadata.jsonl` which contains GAIA benchmark questions and their correct answers. This file is used for:

1. **Testing & Validation**: Compare agent answers with correct answers from metadata
2. **Debugging**: See expected answers when testing the agent
3. **Development**: Understand question patterns and expected answer formats

**Note**: In production, the agent generates its own answers. The metadata is only used for comparison and validation purposes.

## Notes

- The agent returns answers directly without "FINAL ANSWER" prefix
- Answers are compared using exact match
- Make sure your Space is public for verification
- The code interpreter has security restrictions to prevent dangerous operations
- Use the "Compare with metadata.jsonl" checkbox in the test interface to see how your agent's answers compare to the correct answers

## License

This project is part of the Hugging Face AI Agents Course - Unit 4 Final Assignment.