# task-482.1 보고서: 고스트 태스크 근본 원인 분석 (task-1.1, task-4.1)

**작성자**: 라 (Ra) / dev3-team
**작성일**: 2026-03-12
**상태**: 분석 완료 (읽기 전용)

---

## 1. 근본 원인

### 1-A. PRIMARY: test_dispatch.py 테스트 격리 취약점 (직접 원인)

**파일**: `/home/jay/workspace/tests/test_dispatch.py`

`dispatch.dispatch()` 를 호출하면서 `subprocess.run`을 모킹하지 않는 테스트들이 존재한다.
이로 인해 실제 `task-timer.py`가 subprocess로 실행되어 **REAL task-timers.json에 쓰기**가 발생한다.

**문제 발생 경로:**
1. `dispatch_mod = _load_dispatch_with_workspace(tmp_path)` → WORKSPACE를 tmp_path로 패치
2. 그러나 `TASK_TIMER` 변수는 모듈 로드 시 이미 `/home/jay/workspace/memory/task-timer.py`로 설정됨 → 패치되지 않음
3. `generate_task_id()` → tmp_path JSON(비어있거나 task-1,2,3 세팅) → "task-1.1" 또는 "task-4.1" 반환
4. `subprocess.run(["python3", REAL_task-timer.py, "start", "task-1.1", ...])` → 실제 실행
5. 실제 task-timers.json에 task-1.1/task-4.1이 "running"으로 기록됨
6. `_cleanup_task()` → tmp_path JSON에서만 삭제 (실제 JSON 미정리)

**특정 문제 테스트 (코드 변경 제안 섹션 참조)**:

| 테스트명 | 팀 | 설명 | 생성되는 ghost ID |
|---|---|---|---|
| `test_dispatch_nonexistent_project` (line 551) | dev1-team | subprocess mock 없음 | task-1.1 |
| `test_dispatch_no_bot_key` (line 558) | dev1-team | subprocess mock 없음 | task-1.1 |
| `test_dispatch_bot_key_none_exits` (line 565) | dev1-team | subprocess mock 없음 | task-1.1 |
| `test_marketing_all_bots_busy_returns_error` (line 1143) | marketing | subprocess mock 없음, tmp JSON에 task 1-3 세팅 | task-4.1 |

`test_marketing_all_bots_busy_returns_error`의 경우:
- tmp_path JSON에 `{"task-1.1": running, "task-2.1": running, "task-3.1": running}` 세팅
- generate_task_id() → max=3 → **task-4.1** 반환
- subprocess로 실제 task-timer.py 실행 → 실제 JSON에 task-4.1 (team=marketing, desc="마케팅 작업") 기록

**증거 (app.log):**
```
[2026-03-12 01:12:30] [INFO] [dispatch] 위임 시작: team=dev1-team, task_id=task-1.1
[2026-03-12 01:12:30] [INFO] [__main__] 태스크 시작: task-1.1 (team=dev1-team)  ← 실제 subprocess
[2026-03-12 01:12:30] [WARNING] [__main__] 이중 등록 시도 거부: task-1.1는 이미 running 상태
[2026-03-12 01:12:30] [INFO] [dispatch] 위임 시작: team=marketing, task_id=task-4.1
[2026-03-12 01:12:30] [INFO] [__main__] 태스크 시작: task-4.1 (team=marketing)  ← 실제 subprocess
```

발생 타이밍: task-481.1 (dispatch.py 수정 작업) 실행 중 Hermes가 pytest 수행 → 01:12:30~35

---

### 1-B. SECONDARY: task-timer.py start_task()의 "stale" 상태 미보호

**파일**: `/home/jay/workspace/memory/task-timer.py`
**위치**: `start_task()` 함수 (line 111-120)

현재 코드:
```python
# 이중 등록 방어: running 상태만 거부
if existing and existing.get("status") == "running":
    return {"status": "already_running", ...}

# completed 덮어쓰기 방지 (task-464.1에서 추가)
if existing and existing.get("status") == "completed":
    return {"status": "error", ...}

# ← "stale" 상태에 대한 보호 없음! stale이면 "running"으로 덮어씌움
```

task-1.1이 "completed" 상태라면 재생성이 거부됨.
그러나 "stale" 상태(2시간 running 후 cleanup_stale()이 전환)라면 재생성 허용됨.

**재현 사이클:**
```
[테스트] → task-1.1 running 기록
[cleanup_stale 2h 후] → task-1.1 stale으로 전환
[다음 테스트 실행] → stale 상태이므로 start_task() 통과 → task-1.1 running 재기록
```

---

## 2. 재현 조건

1. Hermes(dev1-team) 또는 다른 팀장이 pytest를 실행할 때 test_dispatch.py 포함
2. task-1.1이 task-timers.json에 없거나, "stale" 상태이거나, "reserved" 상태일 때
3. task-4.1이 task-timers.json에 없거나, "stale" 상태일 때
   (tmp_path JSON에 task-1.1, 2.1, 3.1이 세팅되어 있어 generate_task_id()가 task-4.1 반환)

**재현 빈도**: dispatch.py 수정 작업마다 pytest 실행 → 반복 재현

---

## 3. 수정 방안 (코드 수정 제안, 파일/라인 수준)

### 수정 A (필수, HIGH): test_dispatch.py - subprocess.run 모킹 추가

**파일**: `/home/jay/workspace/tests/test_dispatch.py`

**수정 위치 1**: `test_dispatch_nonexistent_project` (line 551~556)
```python
# 현재 (문제):
with patch.object(dispatch_mod, "BOT_KEYS", {"dev1": "key1", "anu": "anu-key"}):
    result = dispatch_mod.dispatch("dev1-team", "작업", project_id="nonexistent-proj")

# 수정 (subprocess mock 추가):
with (
    patch.object(dispatch_mod, "subprocess") as mock_sub,
    patch.object(dispatch_mod, "BOT_KEYS", {"dev1": "key1", "anu": "anu-key"}),
):
    mock_sub.run.return_value = MagicMock(returncode=0, stdout="{}", stderr="")
    result = dispatch_mod.dispatch("dev1-team", "작업", project_id="nonexistent-proj")
```

**수정 위치 2**: `test_dispatch_no_bot_key` (line 558~563)
**수정 위치 3**: `test_dispatch_bot_key_none_exits` (line 565~570) - 동일 패턴
**수정 위치 4**: `test_marketing_all_bots_busy_returns_error` (line 1143~1163) - 동일 패턴

또는 대안으로, `dispatch_mod` fixture에 TASK_TIMER 패치 추가:
```python
# _load_dispatch_with_workspace() 함수에 추가:
_dispatch.WORKSPACE = tmp_path
_dispatch.TASK_TIMER = tmp_path / "memory" / "task-timer.py"  # ← 추가
```

### 수정 B (권장, MEDIUM): task-timer.py - stale 상태 보호 추가

**파일**: `/home/jay/workspace/memory/task-timer.py`
**위치**: `start_task()` 함수, line 117 이후

```python
# 완료된 task 덮어쓰기 방지 (task-464.1)
if existing and existing.get("status") == "completed":
    logger.warning(f"완료된 task 덮어쓰기 시도 거부: {task_id}")
    return {"status": "error", "reason": f"task_id '{task_id}' is already completed. Use a new ID."}

# ← 아래 추가 (수정 B):
# stale task 재시작 방지
if existing and existing.get("status") == "stale":
    logger.warning(f"stale task 재시작 시도 거부: {task_id}")
    return {"status": "error", "reason": f"task_id '{task_id}' is in stale state. Use a new ID."}
```

### 수정 C (즉시 완화): 현재 ghost task 처리

task-1.1, task-4.1이 현재 "completed" 상태 → 다음 테스트 실행 시 재생성 차단됨
**삭제하지 말 것**: 삭제 시 "completed" 보호가 사라져 재생성됨
수정 A, B 적용 전까지 "completed" 상태 유지 권장

---

## 4. 조사 범위 요약

| 조사 항목 | 결과 |
|---|---|
| task-timers.json 쓰기 경로 전수 조사 | test_dispatch.py 미모킹 테스트 → 확인 |
| dispatch.py followup 메커니즘 | .done.clear 미체크로 무한루프 가능 (task-481.1에서 수정됨) |
| task-timer.py start 호출 추적 | test_dispatch.py 4개 테스트에서 실제 subprocess 호출 확인 |
| cron 스케줄 확인 | 현재 등록된 cron 없음 (task-1.1/4.1 관련 없음) |
| task-timers.json 현재 상태 | task-1.1/task-4.1 모두 "completed" (수동 완료됨) |

---

## 5. 추가 발견: followup 무한루프 (task-481.1에서 수정됨)

`dispatch.py _register_followup()` 이전 버전 문제:
- `.done` 없으면 → "진행 중" + 2분 후 재확인 스케줄 → 무한 재확인
- task-481.1에서 `.done.clear` 체크 추가로 수정 완료

---

## QC 검증 결과

```json
{
  "task_id": "task-482.1",
  "verified_at": "2026-03-12T07:07:53",
  "overall": "PASS (조건부)",
  "checks": {
    "file_check": "PASS (보고서 생성됨)",
    "data_integrity": "PASS",
    "api_health": "SKIP (서버 작업 아님)",
    "test_runner": "SKIP (분석 작업, 코드 변경 없음)",
    "tdd_check": "N/A (Lv.1 분석 작업)",
    "schema_contract": "SKIP (workers 변경 없음)"
  }
}
```

---

## QC 셀프 체크리스트

- [x] 1. 이 변경이 다른 파일에 영향을 미치는가? → 분석만, 코드 변경 없음
- [x] 2. 엣지 케이스는? → stale 상태, deleted 상태 모두 분석
- [x] 3. 작업 지시와 일치? → 근본 원인/재현 조건/수정 방안 모두 포함
- [x] 4. 보안 확인? → .env.keys 내용 미포함
- [x] 5. 테스트 커버? → 분석 작업 해당 없음