# Autoresearch OAuth 인증 전환 + 비용 계산 정확도 수정

## 배경
- Autoresearch가 `ANTHROPIC_API_KEY`로만 인증 → 실행 불가 (키 없음)
- 현재 시스템은 Claude Max OAuth 사용 중 → OAuth 토큰으로 전환 필요
- 비용 계산이 전부 Sonnet 단가로 → judge(Haiku) 토큰 구분 안 됨

## 제이회장님 승인 범위
- 코드 변경 이번만 승인 (autoresearch 코드만)
- 변경 대상: `scripts/autoresearch/` 내 파일
- 다른 디렉토리 수정 금지

---

## 작업 1: OAuth 인증 전환

### 1-1. `skill_executor.py` — `load_env_key()` → `load_auth()` 변경

현재 `load_env_key()`는 `ANTHROPIC_API_KEY`만 찾음. 아래 우선순위로 확장:

**인증 탐색 순서:**
1. 환경변수 `ANTHROPIC_AUTH_TOKEN` → OAuth 토큰
2. 파일 `~/.claude/.credentials.json` → `claudeAiOauth.accessToken` (expiresAt 확인)
3. 환경변수 `ANTHROPIC_API_KEY` → API 키
4. `.env.keys` / `.env` 파일에서 `ANTHROPIC_API_KEY` (기존 로직)

**반환값:** `dict` 형태
```python
def load_auth() -> dict:
    """인증 정보를 탐색하여 반환.

    Returns:
        {"type": "oauth", "auth_token": "sk-ant-oat01-..."}
        또는
        {"type": "api_key", "api_key": "sk-ant-api..."}

    Raises:
        EnvironmentError: 인증 수단을 찾을 수 없는 경우
    """
```

**`~/.claude/.credentials.json` 파싱 로직:**
```python
import json, time
from pathlib import Path

cred_path = Path.home() / ".claude" / ".credentials.json"
if cred_path.exists():
    creds = json.loads(cred_path.read_text())
    oauth = creds.get("claudeAiOauth", {})
    access_token = oauth.get("accessToken")
    expires_at = oauth.get("expiresAt", 0)  # milliseconds timestamp
    if access_token and expires_at > time.time() * 1000:
        return {"type": "oauth", "auth_token": access_token}
```

**기존 `load_env_key()` 유지 (하위호환):** 기존 함수를 deprecated wrapper로 남겨둘 것.
```python
def load_env_key(key_name: str = "ANTHROPIC_API_KEY") -> str:
    """[deprecated] load_auth() 사용 권장. 하위호환용."""
    auth = load_auth()
    if auth["type"] == "api_key":
        return auth["api_key"]
    return auth.get("auth_token", "")
```

### 1-2. `skill_executor.py` — `execute_skill()` 수정

**현재:**
```python
client = anthropic.Anthropic(api_key=api_key)
```

**변경:**
```python
def _create_client(auth: dict | None = None, api_key: str | None = None) -> anthropic.Anthropic:
    """인증 정보로 Anthropic 클라이언트 생성."""
    if auth is None and api_key is None:
        auth = load_auth()

    if api_key is not None:
        return anthropic.Anthropic(api_key=api_key)

    if auth["type"] == "oauth":
        return anthropic.Anthropic(auth_token=auth["auth_token"])
    else:
        return anthropic.Anthropic(api_key=auth["api_key"])
```

`execute_skill()` 시그니처: `api_key` 파라미터 유지 (하위호환) + `auth` 파라미터 추가
```python
def execute_skill(
    skill_body: str,
    test_input: str,
    model: str = "claude-sonnet-4-6",
    api_key: Optional[str] = None,
    auth: Optional[dict] = None,
) -> dict[str, object]:
```

### 1-3. `mutator.py` — `generate_mutation()` 수정

**현재 (line 106-108):**
```python
resolved_api_key = api_key if api_key is not None else os.environ.get("ANTHROPIC_API_KEY")
client = anthropic.Anthropic(api_key=resolved_api_key)
```

**변경:** `skill_executor.py`의 `_create_client()`를 import해서 사용
```python
from scripts.autoresearch.skill_executor import _create_client, load_auth

# generate_mutation() 시그니처에 auth 파라미터 추가
def generate_mutation(..., api_key: str | None = None, auth: dict | None = None) -> dict:
    client = _create_client(auth=auth, api_key=api_key)
```

### 1-4. `judge.py` — `judge_output()` 수정

**현재 (line 137-139):**
```python
resolved_api_key = api_key if api_key is not None else os.environ.get("ANTHROPIC_API_KEY")
client = anthropic.Anthropic(api_key=resolved_api_key)
```

**변경:** 동일하게 `_create_client()` 사용
```python
from scripts.autoresearch.skill_executor import _create_client, load_auth

def judge_output(..., api_key: str | None = None, auth: dict | None = None) -> dict:
    client = _create_client(auth=auth, api_key=api_key)
```

### 1-5. `runner.py` — auth 전파

runner.py의 `run()` 함수 시작 시 한 번만 `load_auth()` 호출 후, 각 모듈 호출에 `auth=auth` 전달.
```python
auth = load_auth()
# 이후 execute_skill(auth=auth), generate_mutation(auth=auth), judge_output(auth=auth)
```

---

## 작업 2: 비용 계산 정확도 수정

### 2-1. `runner.py` — `run_round()` changelog에 세분화 토큰 기록

**현재 (line 249-262):** 모든 토큰을 합산하여 `input_tokens`, `output_tokens`만 기록

**변경:** 추가 필드 기록
```python
# 기존 합산 필드 유지 (하위호환)
input_tokens=total_input_tokens,
output_tokens=total_output_tokens,
# 새 필드 추가
mutation_input_tokens=mut_input_tokens,
mutation_output_tokens=mut_output_tokens,
execution_input_tokens=total_exec_input_tokens,
execution_output_tokens=total_exec_output_tokens,
judge_input_tokens=judge_input_tokens,
judge_output_tokens=judge_output_tokens,
```

### 2-2. `reporter.py` — `generate_report()` 비용 계산 정확화

**현재 (line 102):**
```python
total_cost = estimate_cost(total_input_tokens, total_output_tokens, model="sonnet")
```

**변경:**
```python
# Mutation + Execution = Sonnet, Judge = Haiku
sonnet_input = sum(int(r.get("mutation_input_tokens", 0)) + int(r.get("execution_input_tokens", 0)) for r in rounds)
sonnet_output = sum(int(r.get("mutation_output_tokens", 0)) + int(r.get("execution_output_tokens", 0)) for r in rounds)
haiku_input = sum(int(r.get("judge_input_tokens", 0)) for r in rounds)
haiku_output = sum(int(r.get("judge_output_tokens", 0)) for r in rounds)

# 하위호환: 새 필드가 없으면 기존 방식 (전부 Sonnet)
if sonnet_input == 0 and haiku_input == 0:
    total_cost = estimate_cost(total_input_tokens, total_output_tokens, model="sonnet")
    cost_breakdown = f"(Sonnet 기준, 추정치)"
else:
    sonnet_cost = estimate_cost(sonnet_input, sonnet_output, model="sonnet")
    haiku_cost = estimate_cost(haiku_input, haiku_output, model="haiku")
    total_cost = sonnet_cost + haiku_cost
    cost_breakdown = f"(Sonnet: ${sonnet_cost:.2f} + Haiku: ${haiku_cost:.2f})"
```

보고서 출력 수정:
```python
lines.append(f"- 예상 비용: ${total_cost:.2f} {cost_breakdown}")
```

---

## 검증

### 필수 테스트
1. **기존 150개 테스트 전체 PASS** — 하위호환 깨지면 안 됨
2. **OAuth 인증 동작 확인:**
   ```bash
   cd /home/jay/workspace && python3 -c "
   from scripts.autoresearch.skill_executor import load_auth
   auth = load_auth()
   print('Auth type:', auth['type'])
   # OAuth면 'oauth', API키면 'api_key'
   "
   ```
3. **클라이언트 생성 확인:**
   ```bash
   cd /home/jay/workspace && python3 -c "
   from scripts.autoresearch.skill_executor import _create_client
   client = _create_client()
   print('Client created:', type(client))
   print('Has auth_token:', client.auth_token is not None)
   "
   ```
4. **실제 API 호출 (1회 소규모):**
   ```bash
   cd /home/jay/workspace && python3 scripts/autoresearch/runner.py \
     --skill ad-creative \
     --checklist skills/ad-creative/evals/checklist.yaml \
     --test-input "보험 FA 모집 테스트" \
     --rounds 1 --dry-run
   ```
   → dry-run이 API 호출 한다면, 1라운드로 동작 확인
5. **비용 필드 확인:** changelog에 `mutation_input_tokens`, `judge_input_tokens` 등 새 필드 포함 확인
6. **pyright 에러 0건**

### 실행 테스트 (OAuth 동작 확인 후)
```bash
cd /home/jay/workspace && python3 scripts/autoresearch/runner.py \
  --skill ad-creative \
  --checklist skills/ad-creative/evals/checklist.yaml \
  --test-inputs-file skills/ad-creative/evals/test-inputs.yaml \
  --rounds 5 --target-score 0.90
```

성공 시: 리포트 생성
```bash
cd /home/jay/workspace && python3 scripts/autoresearch/reporter.py --skill ad-creative
```

---

## 주의사항
- ★ `scripts/autoresearch/` 내 파일만 수정. 다른 디렉토리 수정 금지.
- ★ 기존 `api_key` 파라미터 유지 (하위호환). 새 `auth` 파라미터는 Optional 추가.
- ★ `~/.claude/.credentials.json` 경로와 구조를 하드코딩하되, 파일 없으면 graceful fallback.
- ★ OAuth 토큰을 로그/보고서/stdout에 절대 노출 금지. `auth_token[:10]...` 식으로 마스킹.
- ★ `load_auth()`에서 토큰 만료(expiresAt) 체크 필수. 만료된 토큰은 건너뛰고 다음 소스 시도.
- ★ `changelog.py`의 `add_round()` 함수가 kwargs를 받을 수 있는지 확인. 안 되면 수정.