# task: 카드뉴스 Review 검수 점수 파싱 버그 수정

## 목표
5단계 파이프라인에서 카드뉴스 타입 Review 시 점수가 0점으로 폴백되는 버그 수정.
텍스트 타입은 정상 통과하지만 카드뉴스는 항상 0점 → RuntimeError 발생.

## 프로젝트 경로
`/home/jay/projects/ThreadAuto/`

## 문제 원인 분석 (이미 완료)

### 1. 파이프라인 점수 파싱 (`five_stage_pipeline.py` 89~95줄)
```python
if "total_score" in review_output:
    review_score = review_output["total_score"]
elif isinstance(review_output.get("evaluation"), dict) and "total_score" in review_output["evaluation"]:
    review_score = review_output["evaluation"]["total_score"]
else:
    review_score = review_output.get("score", 0)  # ← 0점 폴백
```

탐색 순서: 최상위 `total_score` → `evaluation.total_score` → `score` → 0
카드뉴스일 때 Claude가 최상위 `total_score`를 빼먹으면 0점 폴백됨.

### 2. Review 프롬프트 (`prompts/05_review.md`)
텍스트 타입 출력 JSON 예시는 있지만, **카드뉴스 타입 출력 예시가 없음**.
→ Claude가 카드뉴스일 때 `total_score` 배치를 임의로 함 → 파싱 실패

### 3. `score` 폴백 문제
`score`는 "자연스러움 점수(1~10)"인데 합격 기준은 60점(70점 만점).
score가 잡혀도 10 < 60이므로 절대 통과 불가.

## 수정 내용

### 수정 1: `prompts/05_review.md` — 카드뉴스 출력 예시 추가
- 기존 텍스트 타입 출력 예시 근처에 카드뉴스 타입 출력 예시 추가
- **반드시 최상위 `total_score` 필드를 명시**
- 카드뉴스 예시 구조:
```json
{
  "evaluation": {
    "first_line_stopping_power": {"score": 7, "comment": "..."},
    "target_clarity": {"score": 8, "comment": "..."},
    "loss_delivery": {"score": 6, "comment": "..."},
    "reversal_strength": {"score": 7, "comment": "..."},
    "credibility": {"score": 8, "comment": "..."},
    "save_worthiness": {"score": 7, "comment": "..."},
    "comment_inducing": {"score": 6, "comment": "..."}
  },
  "total_score": 49,
  "score": 8,
  "weakest_3": ["loss_delivery", "comment_inducing", "reversal_strength"],
  "issues": [],
  "fixed_content": {
    "slides": [...],
    "caption": "...",
    "hashtags": []
  },
  "needs_human_review": false,
  "review_notes": ""
}
```

### 수정 2: `content/five_stage_pipeline.py` — 폴백 로직 강화
점수 파싱을 더 견고하게:
```python
def _extract_review_score(self, review_output: dict) -> int:
    """Review 출력에서 total_score를 추출. 여러 위치를 탐색."""
    # 1. 최상위 total_score
    if "total_score" in review_output:
        score = review_output["total_score"]
        if isinstance(score, (int, float)):
            return int(score)

    # 2. evaluation.total_score
    evaluation = review_output.get("evaluation")
    if isinstance(evaluation, dict):
        if "total_score" in evaluation:
            score = evaluation["total_score"]
            if isinstance(score, (int, float)):
                return int(score)

        # 3. evaluation 내 개별 항목 점수 합산 (새로운 폴백)
        item_scores = []
        for key, val in evaluation.items():
            if isinstance(val, dict) and "score" in val:
                s = val["score"]
                if isinstance(s, (int, float)):
                    item_scores.append(int(s))
        if item_scores:
            return sum(item_scores)

    # 4. 최상위 score (자연스러움 점수 1~10 → 이건 total_score가 아님, 사용하지 않음)
    # score 필드는 자연스러움 점수(1~10)이므로 total_score(0~70)로 사용하면 안 됨

    # 5. 최종 폴백: -1 반환하여 "파싱 실패"를 명시적으로 표현
    return -1
```

파이프라인에서 -1일 때 처리:
```python
review_score = self._extract_review_score(review_output)
if review_score == -1:
    logger.warning("Review 점수 파싱 실패, review_output 키: %s", list(review_output.keys()))
    # 파싱 실패 시 재시도 (콘텐츠가 나쁜 게 아니라 포맷 문제)
    continue
```

### 수정 3: `MAX_FULL_RETRIES` 상향
- 현재: `MAX_FULL_RETRIES = 1` (총 2회 시도)
- 변경: `MAX_FULL_RETRIES = 2` (총 3회 시도)
- 파싱 실패 1회 + 실제 품질 미달 1회까지 허용

## 테스트
1. Review 프롬프트 변경 후 카드뉴스 타입으로 Review 스테이지만 단독 테스트
2. `_extract_review_score()` 유닛 테스트:
   - 최상위 total_score 있을 때 → 정상 추출
   - evaluation.total_score만 있을 때 → 정상 추출
   - evaluation 내 개별 score만 있을 때 → 합산
   - 아무 점수도 없을 때 → -1 반환
   - score가 문자열("8점")일 때 → -1 반환
3. 기존 텍스트 타입 테스트가 깨지지 않는지 확인

## 주의
- 05_review.md 프롬프트 수정 시 기존 텍스트 타입 예시는 절대 건드리지 말 것
- 카드뉴스 예시만 추가
- `score` 필드(자연스러움 1~10)를 total_score(0~70)로 오용하지 말 것