ํ•œ ์ค„๋กœ ๋งํ•˜๋ฉด, ์—์ด์ „ํŠธ ํ’ˆ์งˆ์€ โ€˜์ž˜ ๋Œ์•„๊ฐ„๋‹คโ€™๊ฐ€ ์•„๋‹ˆ๋ผ โ€˜๋ฐ˜๋ณต ์‹คํ–‰ํ•ด๋„ ๊ฐ™์€ ๊ธฐ์ค€์œผ๋กœ ํ†ต๊ณผํ•œ๋‹คโ€™๋กœ ๊ด€๋ฆฌํ•ด์•ผ ํ•œ๋‹ค.
์ด๋ฒˆ ๋ณธํŽธ์€ smolagents ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ์— ๋„๊ตฌ ํ…Œ์ŠคํŠธ + ํ‰๊ฐ€ ๋ฃจํ”„(eval) ๋ฅผ ๋ถ™์—ฌ ์‹ค๋ฌด ๋ฐฐํฌ ์ „ ์‹ ๋ขฐ๋„๋ฅผ ํ™•๋ณดํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃฌ๋‹ค.

์™œ ์ง€๊ธˆ ํ‰๊ฐ€ ๋ฃจํ”„๊ฐ€ ํ•„์š”ํ•œ๊ฐ€

  • ์—์ด์ „ํŠธ๋Š” ๊ฐ™์€ ์งˆ๋ฌธ์—๋„ ๋„๊ตฌ ํ˜ธ์ถœ ์ˆœ์„œ๊ฐ€ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ๋‹ค.
  • ๊ฒฐ๊ณผ ๋ฌธ์žฅ์ด ์ž์—ฐ์Šค๋Ÿฌ์›Œ๋„, ์ •๋‹ต ์กฐ๊ฑด์„ ๋†“์น  ์ˆ˜ ์žˆ๋‹ค.
  • ๋”ฐ๋ผ์„œ ๋ณธ๋ฌธ ์ƒ์„ฑ ์ด์ „์— ์ž๋™ ํŒ์ • ๊ธฐ์ค€(์„ฑ๊ณต/์‹คํŒจ) ์„ ๋จผ์ € ๊ณ ์ •ํ•ด์•ผ ํ•œ๋‹ค.
flowchart LR
  A[์š”์ฒญ ์ž…๋ ฅ] --> B[CodeAgent ์‹คํ–‰]
  B --> C[๋„๊ตฌ ํ˜ธ์ถœ ๋กœ๊ทธ ์ˆ˜์ง‘]
  C --> D{ํ‰๊ฐ€ ๊ทœ์น™ ํ†ต๊ณผ?}
  D -->|Yes| E[๋ฐฐํฌ ํ›„๋ณด]
  D -->|No| F[ํ”„๋กฌํ”„ํŠธ/๋„๊ตฌ ์ˆ˜์ •]
  F --> B

์‹ค์Šต ๋ชฉํ‘œ

  1. smolagents๋กœ ๊ฐ„๋‹จํ•œ ์—…๋ฌด ์—์ด์ „ํŠธ๋ฅผ ๋งŒ๋“ ๋‹ค.
  2. ๋„๊ตฌ ๋‹จ์œ„ ํ…Œ์ŠคํŠธ(selfcheck)๋ฅผ ํ†ต๊ณผ์‹œํ‚จ๋‹ค.
  3. ์ƒ˜ํ”Œ ๊ณผ์ œ ์…‹์œผ๋กœ ํ‰๊ฐ€(eval) ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.
  4. ํ†ต๊ณผ ๊ธฐ์ค€ ๋ฏธ๋‹ฌ ์‹œ ์ˆ˜์ • ํฌ์ธํŠธ๋ฅผ ๋ฐ”๋กœ ์ฐพ๋Š”๋‹ค.

1) ์ค€๋น„: ํด๋”/ํ™˜๊ฒฝ/ํŒจํ‚ค์ง€

Step 1-1. ์ž‘์—… ํด๋” ์ƒ์„ฑ

  • ๋„๊ตฌ: ํ„ฐ๋ฏธ๋„
  • ์ž…๋ ฅ: ์—†์Œ
  • ์‹คํ–‰๋ช…๋ น:
mkdir -p ~/hf-agents-day15 && cd ~/hf-agents-day15
  • ์„ฑ๊ณตํŒ์ •: pwd ๊ฒฐ๊ณผ๊ฐ€ ~/hf-agents-day15

Step 1-2. ๊ฐ€์ƒํ™˜๊ฒฝ + ํŒจํ‚ค์ง€ ์„ค์น˜

  • ๋„๊ตฌ: Python 3.10+
  • ์ž…๋ ฅ: ์—†์Œ
  • ์‹คํ–‰๋ช…๋ น:
python3 -m venv .venv
source .venv/bin/activate
pip install -U smolagents
  • ์„ฑ๊ณตํŒ์ •:
python -c "import smolagents; print('OK')"

์ถœ๋ ฅ์— OK๊ฐ€ ๋‚˜์˜ค๋ฉด ํ†ต๊ณผ.

Step 1-3. ๋ชจ๋ธ ํ‚ค ์„ค์ •

  • ๋„๊ตฌ: ํ™˜๊ฒฝ๋ณ€์ˆ˜
  • ์ž…๋ ฅ: HF_TOKEN ๋˜๋Š” OpenAI ํ˜ธํ™˜ ํ‚ค
  • ์‹คํ–‰๋ช…๋ น:
export HF_TOKEN="hf_xxx"
# ๋˜๋Š”
export OPENAI_API_KEY="sk-xxx"
  • ์„ฑ๊ณตํŒ์ •: ํ‚ค๊ฐ€ ๋น„์–ด ์žˆ์ง€ ์•Š์Œ (echo ${HF_TOKEN:+set})

2) ์˜ˆ์ œ ์ฝ”๋“œ ์ž‘์„ฑ (๋„๊ตฌ + ์—์ด์ „ํŠธ + ํ‰๊ฐ€)

์•„๋ž˜ ํŒŒ์ผ์„ ๊ทธ๋Œ€๋กœ ์ €์žฅํ•œ๋‹ค.

day15_eval_loop.py

import json
from dataclasses import dataclass
from typing import List, Dict
 
from smolagents import CodeAgent, HfApiModel, tool
 
 
@tool
def shipping_cost(weight_kg: float, distance_km: int, urgent: bool = False) -> str:
    """๋ฌด๊ฒŒ/๊ฑฐ๋ฆฌ/๊ธด๊ธ‰ ์—ฌ๋ถ€๋กœ ๋ฐฐ์†ก๋น„๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค."""
    base = 2500
    weight_fee = int(weight_kg * 400)
    distance_fee = int(distance_km * 3)
    urgent_fee = 3000 if urgent else 0
    total = base + weight_fee + distance_fee + urgent_fee
    return json.dumps({"total_krw": total}, ensure_ascii=False)
 
 
@tool
def policy_lookup(topic: str) -> str:
    """๊ฐ„๋‹จํ•œ ๋‚ด๋ถ€ ์ •์ฑ… ์กฐํšŒ ๋„๊ตฌ"""
    table = {
        "refund": "๊ฐœ๋ด‰ ์ „ 7์ผ ์ด๋‚ด ์ „์•ก ํ™˜๋ถˆ ๊ฐ€๋Šฅ",
        "delivery": "ํ‰์ผ 15์‹œ ์ด์ „ ์ฃผ๋ฌธ์€ ๋‹น์ผ ์ถœ๊ณ ",
        "warranty": "์ „์ž์ œํ’ˆ ๊ธฐ๋ณธ ๋ณด์ฆ 1๋…„"
    }
    return table.get(topic.lower(), "์ •์ฑ… ์—†์Œ")
 
 
def build_agent():
    model = HfApiModel("Qwen/Qwen2.5-72B-Instruct")
    return CodeAgent(
        tools=[shipping_cost, policy_lookup],
        model=model,
        max_steps=6,
    )
 
 
def selfcheck_tools() -> Dict:
    s1 = json.loads(shipping_cost(2.0, 100, False))
    s2 = policy_lookup("refund")
    return {
        "shipping_tool_ok": s1["total_krw"] == 3600,
        "policy_tool_ok": "ํ™˜๋ถˆ" in s2,
    }
 
 
@dataclass
class Case:
    q: str
    must_include: List[str]
 
 
def run_eval(agent, cases: List[Case]) -> Dict:
    results = []
    passed = 0
 
    for c in cases:
        out = str(agent.run(c.q))
        ok = all(token in out for token in c.must_include)
        passed += 1 if ok else 0
        results.append({"question": c.q, "ok": ok, "output": out})
 
    score = passed / len(cases)
    return {
        "total": len(cases),
        "passed": passed,
        "score": round(score, 2),
        "pass": score >= 0.67,
        "results": results,
    }
 
 
if __name__ == "__main__":
    tool_state = selfcheck_tools()
    print("[SELFHECK]", json.dumps(tool_state, ensure_ascii=False))
 
    agent = build_agent()
    eval_cases = [
        Case(
            q="๋ฌด๊ฒŒ 2kg, ๊ฑฐ๋ฆฌ 100km ์ผ๋ฐ˜ ๋ฐฐ์†ก๋น„๋ฅผ ๊ณ„์‚ฐํ•ด์ค˜.",
            must_include=["3600"],
        ),
        Case(
            q="ํ™˜๋ถˆ ์ •์ฑ… ์š”์•ฝํ•ด์ค˜.",
            must_include=["7์ผ", "ํ™˜๋ถˆ"],
        ),
        Case(
            q="๋ฌด๊ฒŒ 1kg, ๊ฑฐ๋ฆฌ 50km, ๊ธด๊ธ‰ ๋ฐฐ์†ก๋น„ ๊ณ„์‚ฐ ํ›„ ํ•œ ์ค„ ์š”์•ฝ.",
            must_include=["6050"],
        ),
    ]
 
    report = run_eval(agent, eval_cases)
    print("[EVAL]", json.dumps(report, ensure_ascii=False, indent=2))

3) ์‹คํ–‰: selfcheck โ†’ eval

Step 3-1. ๋„๊ตฌ ๋‹จ์œ„ ์ ๊ฒ€

  • ๋„๊ตฌ: day15_eval_loop.py
  • ์ž…๋ ฅ: ์—†์Œ
  • ์‹คํ–‰๋ช…๋ น:
python day15_eval_loop.py
  • ์„ฑ๊ณตํŒ์ •: [SELFHECK](์˜คํƒˆ์ž ๊ทธ๋Œ€๋กœ ์ถœ๋ ฅ๋  ์ˆ˜ ์žˆ์Œ) ๊ฒฐ๊ณผ์—์„œ ์•„๋ž˜ ๋‘˜ ๋‹ค true
    • shipping_tool_ok
    • policy_tool_ok

Step 3-2. ํ‰๊ฐ€ ๊ฒฐ๊ณผ ํ™•์ธ

  • ๋„๊ตฌ: ๊ฐ™์€ ์‹คํ–‰ ๋กœ๊ทธ์˜ [EVAL] JSON
  • ์ž…๋ ฅ: ๋‚ด์žฅ 3๊ฐœ ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค
  • ์‹คํ–‰๋ช…๋ น: Step 3-1๊ณผ ๋™์ผ
  • ์„ฑ๊ณตํŒ์ •:
    • score >= 0.67
    • pass: true

4) ์ดˆ๋ณด์ž์šฉ ํ•ด์„ค (์‰ฝ๊ฒŒ ์ดํ•ดํ•˜๊ธฐ)

  • @tool ํ•จ์ˆ˜๋Š” ์—์ด์ „ํŠธ๊ฐ€ ๊บผ๋‚ด ์“ฐ๋Š” ์ž‘์€ ๊ณ„์‚ฐ๊ธฐ/์‚ฌ์ „์ด๋‹ค.
  • selfcheck_tools()๋Š” ๋„๊ตฌ๊ฐ€ ๋ง๊ฐ€์ง€์ง€ ์•Š์•˜๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๊ธฐ๋ณธ ๊ฑด๊ฐ•๊ฒ€์ง„์ด๋‹ค.
  • run_eval()์€ ์งˆ๋ฌธ ๋ฌถ์Œ์„ ๋Œ๋ ค๋ณด๊ณ , ์ •ํ•ด๋‘” ํ‚ค์›Œ๋“œ๊ฐ€ ์žˆ๋Š”์ง€๋กœ ํ†ต๊ณผ๋ฅผ ์ •ํ•˜๋Š” ์ฑ„์ ๊ธฐ๋‹ค.

์ฆ‰, โ€œํ•œ ๋ฒˆ ์ž˜๋œ ๋ฐ๋ชจโ€๊ฐ€ ์•„๋‹ˆ๋ผ โ€œ๋งค๋ฒˆ ํ†ต๊ณผํ•˜๋Š” ์‹œ์Šคํ…œโ€์œผ๋กœ ๋ฐ”๊พธ๋Š” ๊ณผ์ •์ด ํ‰๊ฐ€ ๋ฃจํ”„๋‹ค.


5) ์‹ค๋ฌด ์ ์šฉ ํฌ์ธํŠธ

  1. ๋ฆด๋ฆฌ์ฆˆ ๊ฒŒ์ดํŠธ: ๋ฐฐํฌ ์ „ score ๋ฏธ๋‹ฌ์ด๋ฉด ์ž๋™ ์ค‘๋‹จ.
  2. ํšŒ๊ท€ ๋ฐฉ์ง€: ํ”„๋กฌํ”„ํŠธ ์ˆ˜์ • ํ›„์—๋„ ๊ธฐ์กด ์ผ€์ด์Šค๋ฅผ ๋‹ค์‹œ ๋Œ๋ ค ํ’ˆ์งˆ ํ•˜๋ฝ ๊ฐ์ง€.
  3. ์šด์˜ ๋กœ๊ทธ ํ‘œ์ค€ํ™”: [SELFHECK], [EVAL] ๊ฐ™์ด ํŒŒ์‹ฑ ๊ฐ€๋Šฅํ•œ ๊ณ ์ • ํ‚ค๋ฅผ ๋‘๋ฉด ๋Œ€์‹œ๋ณด๋“œ ์—ฐ๋™์ด ์‰ฌ์›€.
  4. ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค ๋ถ„๋ฆฌ: ์ดํ›„ eval_cases.json ํŒŒ์ผ๋กœ ๋ถ„๋ฆฌํ•ด ํŒ€์—์„œ ๊ณต๋™ ๊ด€๋ฆฌ.

์ฒดํฌ๋ฆฌ์ŠคํŠธ

  • ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ/ํ™œ์„ฑํ™” ์™„๋ฃŒ
  • smolagents import ์„ฑ๊ณต
  • HF_TOKEN ๋˜๋Š” API ํ‚ค ์„ค์ • ์™„๋ฃŒ
  • selfcheck 2๊ฐœ ํ•ญ๋ชฉ true
  • eval score 0.67 ์ด์ƒ
  • ์‹คํŒจ ์ผ€์ด์Šค 1๊ฐœ ์ด์ƒ ์›์ธ ๊ธฐ๋ก

์ฐธ๊ณ  ๋งํฌ (์šฐ์„ ์ˆœ์œ„)

  1. https://github.com/huggingface/agents-course
  2. https://huggingface.co/learn/agents-course
  3. https://huggingface.co/docs/smolagents

์ƒ์„ฑํ˜• AI ํ™œ์šฉ ๊ณ ์ง€

์ด ๋ฌธ์„œ๋Š” ์ƒ์„ฑํ˜• AI๋ฅผ ํ™œ์šฉํ•ด ์ดˆ์•ˆ์„ ์ž‘์„ฑํ–ˆ๊ณ , ์˜ˆ์‹œ ์ฝ”๋“œ/์ ˆ์ฐจ/ํ‘œํ˜„์€ ์‚ฌ๋žŒ ๊ฒ€ํ†  ํ›„ ํ™•์ •ํ–ˆ๋‹ค.