AbeBhatti commited on
Commit
b4e7ad1
·
1 Parent(s): c922dcd

Clean repo — code only, no weights or training data

Browse files
Files changed (46) hide show
  1. .gitignore +23 -16
  2. proj_context.md +287 -0
  3. session_progress.md +310 -0
  4. training/bluff_training.log +0 -16
  5. training/checkpoints/bluff_classifier_tokenizer/tokenizer.json +0 -0
  6. training/checkpoints/bluff_classifier_tokenizer/tokenizer_config.json +0 -14
  7. training/checkpoints/phase2_final/README.md +0 -67
  8. training/checkpoints/phase2_final/chat_template.jinja +0 -15
  9. training/checkpoints/phase2_final/checkpoint-100/chat_template.jinja +0 -15
  10. training/checkpoints/phase2_final/checkpoint-100/config.json +0 -32
  11. training/checkpoints/phase2_final/checkpoint-100/generation_config.json +0 -9
  12. training/checkpoints/phase2_final/checkpoint-100/tokenizer.json +0 -0
  13. training/checkpoints/phase2_final/checkpoint-100/tokenizer_config.json +0 -19
  14. training/checkpoints/phase2_final/checkpoint-100/trainer_state.json +0 -304
  15. training/checkpoints/phase2_final/checkpoint-200/chat_template.jinja +0 -15
  16. training/checkpoints/phase2_final/checkpoint-200/config.json +0 -32
  17. training/checkpoints/phase2_final/checkpoint-200/generation_config.json +0 -9
  18. training/checkpoints/phase2_final/checkpoint-200/tokenizer.json +0 -0
  19. training/checkpoints/phase2_final/checkpoint-200/tokenizer_config.json +0 -19
  20. training/checkpoints/phase2_final/checkpoint-200/trainer_state.json +0 -574
  21. training/checkpoints/phase2_final/config.json +0 -32
  22. training/checkpoints/phase2_final/generation_config.json +0 -9
  23. training/checkpoints/phase2_final/tokenizer.json +0 -0
  24. training/checkpoints/phase2_final/tokenizer_config.json +0 -19
  25. training/checkpoints/unified_final/README.md +0 -67
  26. training/checkpoints/unified_final/chat_template.jinja +0 -15
  27. training/checkpoints/unified_final/checkpoint-100/chat_template.jinja +0 -15
  28. training/checkpoints/unified_final/checkpoint-100/config.json +0 -32
  29. training/checkpoints/unified_final/checkpoint-100/generation_config.json +0 -9
  30. training/checkpoints/unified_final/checkpoint-100/tokenizer.json +0 -0
  31. training/checkpoints/unified_final/checkpoint-100/tokenizer_config.json +0 -19
  32. training/checkpoints/unified_final/checkpoint-100/trainer_state.json +0 -304
  33. training/checkpoints/unified_final/checkpoint-200/chat_template.jinja +0 -15
  34. training/checkpoints/unified_final/checkpoint-200/config.json +0 -32
  35. training/checkpoints/unified_final/checkpoint-200/generation_config.json +0 -9
  36. training/checkpoints/unified_final/checkpoint-200/tokenizer.json +0 -0
  37. training/checkpoints/unified_final/checkpoint-200/tokenizer_config.json +0 -19
  38. training/checkpoints/unified_final/checkpoint-200/trainer_state.json +0 -574
  39. training/checkpoints/unified_final/config.json +0 -32
  40. training/checkpoints/unified_final/generation_config.json +0 -9
  41. training/checkpoints/unified_final/tokenizer.json +0 -0
  42. training/checkpoints/unified_final/tokenizer_config.json +0 -19
  43. training/checkpoints/unified_final/unified_reward_log.json +0 -810
  44. training/unified_training.log +0 -269
  45. wandb/debug-cli.rayyan.log +0 -0
  46. wandb/settings +3 -0
.gitignore CHANGED
@@ -1,20 +1,27 @@
1
- venv/
2
- .venv/
3
- __pycache__/
4
- *.pyc
5
- *.pth
6
- wandb/
7
- grpo_output/
8
  *.pt
 
 
 
 
9
  selfplay_states.json
10
  selfplay_states_test.json
11
- *.png
12
- .env
13
- proj_context.md
14
- session_progress.md
15
- HF_TOKEN
16
- *.safetensors
17
- *.bin
18
- *.safetensors
19
- *.bin
20
  training/data/poker/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model weights
2
+ *.safetensors
3
+ *.bin
 
 
 
 
4
  *.pt
5
+ *.pth
6
+
7
+ # Training data
8
+ training/data/
9
  selfplay_states.json
10
  selfplay_states_test.json
11
+
12
+ # Poker data
 
 
 
 
 
 
 
13
  training/data/poker/
14
+
15
+ # Training logs and outputs
16
+ training/unified_training.log
17
+ training/bluff_training.log
18
+ grpo_output/
19
+
20
+ # Checkpoints
21
+ training/checkpoints/
22
+
23
+ # Python
24
+ __pycache__/
25
+ *.pyc
26
+ .venv/
27
+ venv/
proj_context.md ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ArbitrAgent — Project Context
2
+ **Read this file at the start of every session. Do not modify it.**
3
+ **After completing your session, update `session_progress.md` with your session number and what you built.**
4
+
5
+ ---
6
+
7
+ ## What We Are Building
8
+
9
+ **ArbitrAgent** is a curriculum-trained negotiation agent that autonomously executes multi-route arbitrage on simulated Craigslist-style markets. It starts with a cash budget ($20), identifies high-value items, simultaneously opens negotiations across multiple buy candidates and downstream trade targets, and only commits capital once a confirmed profitable route is locked.
10
+
11
+ Built for the **OpenEnv Hackathon, March 7-8 2026** at Shack15, San Francisco.
12
+
13
+ **Submission deadline: Sunday March 8, 1:00 PM sharp.**
14
+
15
+ ---
16
+
17
+ ## ✅ Already Built — Do Not Rebuild
18
+
19
+ A teammate completed the following before the hackathon started. Every session must read this before touching any ML or environment code.
20
+
21
+ | Component | Details |
22
+ |-----------|---------|
23
+ | `/home/rayyan/Desktop/Play-gent/reward_model.pt` | DistilBERT fine-tuned on Diplomacy data, val loss 0.102 |
24
+ | `DiplomacyNegotiationEnv` | OpenEnv 0.2.1 compliant, inherits from real Env base class |
25
+ | `ContractorNegotiationEnv` | OpenEnv 0.2.1 compliant, inherits from real Env base class |
26
+ | `/home/rayyan/Desktop/Play-gent/selfplay_states.json` | 211,278 labeled Diplomacy game states |
27
+ | `/home/rayyan/Desktop/Play-gent/grpo_output/checkpoint-200/model.safetensors` | TinyLlama 1.1B, GRPO Phase 1 trained, reward curve -0.35 → +0.63 over 200 steps |
28
+
29
+ **Saturday only requires:** Phase 2 GRPO training (~1.5 hrs), agent loop, seller sims, and demo UI. The hard ML work is done.
30
+
31
+ ---
32
+
33
+ Real negotiation data is private and will never exist as training data. We extract negotiation judgment from two games that together cover the complete negotiation skill surface:
34
+
35
+ - **Diplomacy** → multi-party coalition sequencing, strategic information reveals, long-horizon concession planning, stopping policy
36
+ - **Poker** → bluff detection, behavioral pattern reading, pressure calibration, EV reasoning, clean exits
37
+
38
+ **The combined skill neither game alone produces:** detecting a bluff AND immediately deploying coalition pressure at exactly that moment. That is the demo's proof of training.
39
+
40
+ The training pipeline implements this in three phases: Diplomacy (Phase 1, ✅ complete), Contractor negotiation as an intermediate bluff-detection layer (Phase 2, 🔲 MVP), and full Poker training on the IRC Poker dataset (Phase 3, 🔲 post-MVP). The pitch is true at MVP and becomes fully implemented at Phase 3.
41
+
42
+ ---
43
+
44
+ ## Repository Structure
45
+
46
+ ```
47
+ arbitragent/
48
+ ├── proj_context.md # This file — never modify
49
+ ├── session_progress.md # Updated by each session
50
+ ├── envs/
51
+ │ ├── diplomacy_env.py # ✅ BUILT — DiplomacyNegotiationEnv (OpenEnv 0.2.1)
52
+ │ ├── contractor_env.py # ✅ BUILT — ContractorNegotiationEnv (OpenEnv 0.2.1)
53
+ │ └── poker_env.py # 🔲 POST-MVP — PokerNegotiationEnv (OpenEnv 0.2.1)
54
+ ├── training/
55
+ │ ├── reward_model.py # ✅ BUILT — DistilBERT reward model (val loss 0.102)
56
+ │ ├── checkpoints/ # 🔲 TODO — optional future consolidation of checkpoints
57
+ │ │ ├── phase2_final.pt # 🔲 TODO — after Session B2
58
+ │ │ └── phase3_final.pt # 🔲 POST-MVP — after Session B3
59
+ │ ├── data/ # 🔲 TODO — optional future data subfolder
60
+ │ │ └── (see root-level files for existing data artifacts)
61
+ │ ├── train_phase1.py # ✅ BUILT — GRPO on Diplomacy env (done, -0.35→+0.63)
62
+ │ ├── train_phase2.py # 🔲 TODO — GRPO on Contractor env (Session B2)
63
+ │ ├── train_phase3.py # 🔲 POST-MVP — GRPO on Poker env (Session B3)
64
+ │ └── arbitragent_colab.ipynb # 🔲 TODO — End-to-end Colab notebook (Session B2)
65
+ ├── agent/
66
+ │ ├── arbitragent.py # Main agent orchestration loop (5 phases)
67
+ │ ├── route_graph.py # Route graph: confirmed/soft/dead edges + scoring
68
+ │ └── bluff_detector.py # Signal extraction: timing/size/formulaic/pattern tells
69
+ ├── simulation/
70
+ │ ├── seller_sim.py # CraigslistSellerSim — LLM-backed seller counterparts
71
+ │ ├── seller_profiles.py # All 4 archetype profiles + listing library
72
+ │ └── scenario.py # Demo scenario: which seller ghosts, when bluff triggers
73
+ ├── demo/
74
+ │ ├── run_demo.py # Entry point — takes budget, runs full agent loop
75
+ │ └── display.py # Rich terminal output showing live negotiation threads
76
+ └── deploy/
77
+ └── hf_spaces_app.py # HuggingFace Spaces deployment wrapper
78
+ ```
79
+
80
+ ---
81
+
82
+ ## Training Architecture
83
+
84
+ ### MVP (Submit This)
85
+
86
+ ```
87
+ Phase 1: Diplomacy Training ✅ COMPLETE
88
+ 211,278 labeled Diplomacy game states
89
+ → Reward model (DistilBERT) trained, val loss 0.102
90
+ → GRPO training on TinyLlama 1.1B: 200 steps
91
+ → Reward curve: -0.35 → +0.63
92
+ → Checkpoint saved: `/home/rayyan/Desktop/Play-gent/grpo_output/checkpoint-200/model.safetensors`
93
+
94
+ Phase 2: Contractor Curriculum Training 🔲 TODO — Session B2
95
+ Contractor negotiation scenarios (false-floor, pressure calibration, timing tells)
96
+ → Continue GRPO from phase1_final.pt — do NOT reinitialize weights
97
+ → 200 additional steps
98
+ → Bluff detection accuracy must improve on held-out test set
99
+ → Save checkpoint: training/checkpoints/phase2_final.pt
100
+
101
+ MVP Model: TinyLlama 1.1B, Diplomacy + Contractor trained
102
+ ```
103
+
104
+ ### Post-MVP (If Time Allows — Phase 3)
105
+
106
+ ```
107
+ Phase 3: Poker Curriculum Training 🔲 POST-MVP — Session B3
108
+ IRC Poker Database (free, 10M+ hands, no collection needed)
109
+ → Replay hands as negotiation scenarios
110
+ → Map bet sizing → negotiation pressure
111
+ → Map bluff/fold signals → position authenticity reads
112
+ → Continue GRPO from phase2_final.pt — do NOT reinitialize weights
113
+ → 200 additional steps
114
+ → Reward: EV of outcome vs. EV of folding
115
+ → Save checkpoint: training/checkpoints/phase3_final.pt
116
+
117
+ Full Model: TinyLlama 1.1B, Diplomacy + Contractor + Poker trained
118
+ ```
119
+
120
+ **Build Phase 3 only after:** Phase 2 is complete, demo is running end-to-end, and submission checklist is green. Phase 3 makes the implementation match the pitch exactly — the story becomes true all the way down. Estimated time: ~2 hours to build PokerNegotiationEnv + ~1.5 hours training on DGX.
121
+
122
+ **Why curriculum order matters:** Diplomacy builds the multi-party strategic foundation. Contractor adds false-floor detection on top of that. Poker sharpens the bluff-reading layer with pure behavioral signal. Each phase builds on the last. Running them simultaneously or out of order causes catastrophic forgetting.
123
+
124
+ **Why TinyLlama 1.1B and not LLaMA 3.1 8B:** Training time. 8B on the DGX Spark would take 17–24 hours for two phases alone — the entire hackathon gone on training. TinyLlama 1.1B completes all three phases in ~5 hours total, with Phase 1 already done. Do not switch to 8B.
125
+
126
+ ---
127
+
128
+ ## Tech Stack (LOCKED)
129
+
130
+ | Component | Technology | Status |
131
+ |-----------|-----------|--------|
132
+ | Agent LLM | TinyLlama 1.1B (trained policy) | ✅ Phase 1 trained |
133
+ | Phase 1 Env | DiplomacyNegotiationEnv (OpenEnv 0.2.1) | ✅ Built |
134
+ | Phase 2 Env | ContractorNegotiationEnv (OpenEnv 0.2.1) | ✅ Built |
135
+ | Phase 3 Env | PokerNegotiationEnv (OpenEnv 0.2.1) | 🔲 Post-MVP |
136
+ | Poker Data | IRC Poker Database (free, 10M+ hands) | 🔲 Post-MVP |
137
+ | Reward Model | DistilBERT, val loss 0.102 | ✅ Built |
138
+ | RL Framework | TRL + GRPO | ✅ Phase 1 complete |
139
+ | Training Data | `/home/rayyan/Desktop/Play-gent/selfplay_states.json`, 211,278 states | ✅ Built |
140
+ | Seller Simulation | TinyLlama 1.1B with archetype system prompts | 🔲 Session C1 |
141
+ | Route Graph | NetworkX or custom dict-based | 🔲 Session A2 |
142
+ | Agent Loop | 5-phase orchestration | 🔲 Session A2 |
143
+ | Bluff Detector | 4-signal extractor | 🔲 Session A3 |
144
+ | Demo UI | Rich terminal display | 🔲 Session A4 |
145
+ | Experiment Tracking | Weights & Biases | ✅ Active |
146
+ | Deployment | HuggingFace Spaces + HF Model Hub | 🔲 Session A4 |
147
+ | Hardware | DGX Spark (all training + inference) | ✅ Available |
148
+ | Colab Notebook | End-to-end training script | 🔲 Session B2 |
149
+
150
+ ---
151
+
152
+ ## The Five-Phase Agent Loop
153
+
154
+ ### Phase 1: Scout
155
+ - Query simulated listings for $15–$25 items
156
+ - Score each on: resale demand, trade liquidity, seller bluff probability
157
+ - Select top 3 buy candidates
158
+ - Open soft-inquiry negotiations with all 3 simultaneously
159
+
160
+ ### Phase 2: Route Mapping
161
+ - For each candidate, identify 2-3 trade targets in $35–$80 range
162
+ - Open parallel trade-interest threads
163
+ - Build route graph — edges: Confirmed / Soft / Dead
164
+
165
+ ### Phase 3: Pressure and Confirm
166
+ - Use downstream confirmations as upstream leverage
167
+ - Run bluff detection on seller responses
168
+ - Lock soft commits before committing capital
169
+ - Kill routes below confirmation probability threshold
170
+
171
+ ### Phase 4: Route Scoring
172
+ ```python
173
+ route_score = (confirmed_exit_value - entry_cost)
174
+ × route_confirmation_probability
175
+ × seller_reliability_score
176
+ # Kill if route_score < minimum_threshold
177
+ ```
178
+
179
+ ### Phase 5: Execute
180
+ - Pull trigger on highest scored confirmed route
181
+ - Complete downstream trade
182
+ - Log final value vs. starting budget
183
+
184
+ ---
185
+
186
+ ## The Four Seller Archetypes
187
+
188
+ | Archetype | Response Prob | Floor Behavior | Trade Openness | Demo Purpose |
189
+ |-----------|--------------|----------------|----------------|--------------|
190
+ | Motivated Seller | 0.90 | Real floor, honest | High | Shows clean close |
191
+ | Bluffer | 0.85 | Says "firm" with 30% room left | Medium | Shows poker layer catching tell |
192
+ | Ghoster | 0.35 | Never reaches floor | Low | Shows agent detecting dead route, pivoting |
193
+ | Trade-Curious | 0.80 | Cash-resistant, trade-open | Very High | Shows agent switching offer type |
194
+
195
+ ### Bluff Detection Signals (all four must be checked)
196
+ 1. **Timing tell** — response came in under 1 turn (prepared script, not genuine constraint)
197
+ 2. **Size tell** — concession is a round number (anchoring, not real floor)
198
+ 3. **Formulaic tell** — canned phrasing: "lowest I can go", "final offer", "can't go lower"
199
+ 4. **Pattern tell** — behavior inconsistent with their earlier thread history
200
+
201
+ ### The Critical Demo Inject
202
+ At ~60 seconds into the demo, the Bluffer seller says "this is my final offer" on the vintage camera at $30. This response contains all four tells. The trained model flags it, shows reasoning trace, and deploys coalition pressure: *"I have a trade offer from another seller that makes this less urgent for me — can you do $22?"* Seller concedes to $24. Route executes. Final value: $52 on $24 deployed = 2.2x.
203
+
204
+ **Baseline LLaMA accepts the $30 "final offer" at face value. The trained model doesn't. That gap is the proof.**
205
+
206
+ ---
207
+
208
+ ## Seller Profile Schema
209
+
210
+ ```python
211
+ {
212
+ "id": "seller_001",
213
+ "item": "vintage film camera",
214
+ "listing_price": 45,
215
+ "floor": 28, # hidden from agent
216
+ "archetype": "bluffer",
217
+ "bluff_room": 0.30, # still has 30% room when says "final offer"
218
+ "response_prob": 0.85,
219
+ "response_speed": "fast", # fast | slow | flaky
220
+ "trade_openness": 0.6,
221
+ "personality": "Casual seller, slightly impatient. Texts in short bursts.",
222
+ "tells": ["round numbers", "formulaic language", "too-fast response"]
223
+ }
224
+ ```
225
+
226
+ ### Response Turn Simulation
227
+ ```python
228
+ RESPONSE_PROFILES = {
229
+ "fast": {"turns_to_respond": 1, "ghost_prob": 0.10},
230
+ "slow": {"turns_to_respond": 3, "ghost_prob": 0.30},
231
+ "flaky": {"turns_to_respond": 2, "ghost_prob": 0.60},
232
+ }
233
+ ```
234
+
235
+ ---
236
+
237
+ ## Hackathon Tracks Hit
238
+
239
+ | Track | How |
240
+ |-------|-----|
241
+ | Statement 1: Multi-Agent | Agent manages 9-12 simultaneous counterpart LLMs |
242
+ | Statement 2: Long-Horizon | Route-confirmation arc spans multiple rounds with full state tracking |
243
+ | Statement 4: Self-Improvement | Curriculum RL loop, two-phase measurable reward improvement |
244
+ | Statement 5: Wild Card | Autonomous capital deployment via confirmed route arbitrage |
245
+ | Halluminate $10k bonus | Agent managing multiple actors to discover and achieve the task |
246
+ | Fleet AI $10k bonus | Bluff detection layer as oversight agent scoring counterpart behavior |
247
+
248
+ ---
249
+
250
+ ## The Pitch (memorize this)
251
+
252
+ > "The most important negotiations of your life happen once. The person across the table has done it hundreds of times. The data to train AI on these conversations is sealed by law and will never exist. We found where that judgment already lives at massive scale: in Diplomacy, where millions of humans practiced multi-party coalition strategy, and in Poker, where millions more learned to read when someone's stated position is real versus a bluff. We trained on both — curriculum style — and built an agent that doesn't just know negotiation theory. It has internalized when to move, when to wait, and when the other side is lying about their floor. Then we gave it $20 and let it run."
253
+
254
+ ---
255
+
256
+ ## Judge Q&A (have these ready)
257
+
258
+ **"Couldn't you just prompt GPT-4 to do this?"**
259
+ GPT-4 knows negotiation tactics abstractly. It has no learned behavioral policy about *when* to deploy them. It hasn't lost thousands of negotiations by revealing coalition pressure too early. Our model has — and the reward curves are the proof.
260
+
261
+ **"Does game training actually transfer to real negotiation?"**
262
+ The structural isomorphism is direct. Coalition sequencing in Diplomacy is mechanically identical to sequential offer reveals in any multi-party negotiation. Bluff detection in contractor bidding scenarios — reading whether a contractor's stated floor is real — is mechanically identical to the same skill in any negotiation. We're not claiming domain transfer — we're claiming the cognitive mechanics are identical across surface vocabulary.
263
+
264
+ **"Why simulate instead of real Craigslist?"**
265
+ Craigslist has 6-hour response latency, no API, and one ghost kills a live demo. Our parameterized LLM counterparts replicate the four real seller archetypes we identified from Craigslist interaction patterns. The agent reads behavioral signals in real time exactly as it would with real sellers.
266
+
267
+ **"Why GRPO instead of PPO?"**
268
+ GRPO is more sample-efficient for language model fine-tuning and produces more stable training. It's the same algorithm DeepSeek-R1 used. Our Phase 1 reward curve — -0.35 to +0.63 over 200 steps — is the evidence it works.
269
+
270
+ ---
271
+
272
+ ## Submission Requirements (do not miss any)
273
+
274
+ - [x] Reward model on HF Model Hub — **already built, just needs uploading**
275
+ - [x] Phase 1 reward curves (Diplomacy GRPO, -0.35 → +0.63) — **already exists, needs clean plot**
276
+ - [ ] Both envs live on HuggingFace Spaces (OpenEnv 0.2.1)
277
+ - [ ] Phase 2 reward curves (Contractor GRPO, climbing over 200 steps)
278
+ - [ ] Colab notebook: full curriculum training loop, runs in one click
279
+ - [ ] Side-by-side: trained vs baseline on same negotiation
280
+ - [ ] Full ArbitrAgent demo: $20 → autonomous route execution → final value
281
+ - [ ] 1-minute YouTube demo video (live agent run, no slides)
282
+ - [ ] Public GitHub repo with README
283
+ - [ ] Submit at cerebralvalley.ai by Sunday 1:00 PM
284
+
285
+ ---
286
+
287
+ *This file is the ground truth for the project. If anything in session_progress.md conflicts with this file, this file wins on architecture and thesis. session_progress.md wins on what has already been built.*
session_progress.md ADDED
@@ -0,0 +1,310 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ArbitrAgent — Session Progress
2
+ **This file is updated at the END of every session.**
3
+ **The next session reads this before doing anything else.**
4
+ **Format: add your session block below the last completed one.**
5
+
6
+ ---
7
+
8
+ ## How To Update This File
9
+
10
+ At the end of your session, append a block in this format:
11
+
12
+ ```
13
+ ## Session [N] — [Workstream] — [Date/Time]
14
+ **Status:** Complete | Partial | Blocked
15
+
16
+ ### What Was Built
17
+ - [specific file or function name]: [what it does]
18
+
19
+ ### What Was Tested
20
+ - [what you ran, what the output was]
21
+
22
+ ### Decisions Made
23
+ - [any architecture or implementation decision made during the session]
24
+
25
+ ### Blockers / Known Issues
26
+ - [anything the next session needs to know or fix]
27
+
28
+ ### Files Modified
29
+ - [list every file touched]
30
+
31
+ ### Next Session Entry Point
32
+ [Exact instruction for what the next session in this workstream should do first]
33
+ ```
34
+
35
+ ---
36
+
37
+ ## Session Log
38
+
39
+ ## Session 0 — Pre-Work Completed by Teammate — March 7 AM
40
+
41
+ **Status:** Complete
42
+
43
+ ### What Was Built
44
+ - `/home/rayyan/Desktop/Play-gent/selfplay_states.json` — 211,278 labeled Diplomacy game states from real Diplomacy data
45
+ - `/home/rayyan/Desktop/Play-gent/reward_model.pt` — DistilBERT fine-tuned on above data, val loss 0.102
46
+ - `envs/diplomacy_env.py` — DiplomacyNegotiationEnv, OpenEnv 0.2.1 compliant
47
+ - `envs/contractor_env.py` — ContractorNegotiationEnv, OpenEnv 0.2.1 compliant (Phase 2 bluff-detection env)
48
+ - `/home/rayyan/Desktop/Play-gent/grpo_output/checkpoint-200/model.safetensors` — TinyLlama 1.1B, GRPO Phase 1 trained, reward curve -0.35 → +0.63 over 200 steps
49
+
50
+ ### What Was Tested
51
+ - GRPO training run confirmed climbing reward curve over 200 steps
52
+ - Both environments confirmed OpenEnv 0.2.1 compliant
53
+
54
+ ### Decisions Made
55
+ - Model is TinyLlama 1.1B (not LLaMA 3.1 8B) — intentional, enables fast inference in demo
56
+ - Training framework is GRPO (not PPO) — more sample-efficient, same algorithm as DeepSeek-R1
57
+ - Phase 2 environment is ContractorNegotiationEnv (not PokerNegotiationEnv) — trains identical bluff-detection skills via false-floor contractor scenarios
58
+
59
+ ### Blockers / Known Issues
60
+ - Verify actual file paths above match reality before Session A1 or B1 starts — paths above are best guesses, confirm with teammate
61
+
62
+ ### Next Session Entry Points
63
+ - **Session A1:** Both envs already exist. Verify they smoke test clean (reset, step, render). Do NOT rebuild them. Then set up repo structure around them.
64
+ - **Session B1:** reward_model.pt and phase1_final.pt already exist. Verify both load and run inference correctly. Do NOT retrain. Generate the Phase 1 reward curve plot for submission evidence.
65
+
66
+ ## Session A1+B2 — Infra/Training — March 7 PM
67
+
68
+ **Status:** Complete
69
+
70
+ ### What Was Built
71
+ - `envs/human_imitation_env.py`: HumanImitationEnv (OpenEnv 0.2.1) that embeds real Diplomacy game states from `training/data/selfplay_states.json` and provides shaped rewards aligned with human outcomes.
72
+ - `training/train_phase2.py`: GRPO Phase 2 training script that continues from `grpo_output/checkpoint-200` on HumanImitationEnv without reinitializing weights, logs rewards, and saves to `training/checkpoints/phase2_final`.
73
+ - `test_all_envs.py`: Unified smoke test script that instantiates and renders `DiplomacyNegotiationEnv`, `ContractorNegotiationEnv`, and `HumanImitationEnv`.
74
+ - Repository structure folders: `envs/`, `training/` (with `data/` and `checkpoints/`), `agent/`, `simulation/`, `demo/`, `deploy/` created around existing flat files.
75
+ - Data/checkpoint copies: `reward_model.pt`, `selfplay_states.json`, and `selfplay_states_test.json` copied into the new `training/checkpoints/` and `training/data/` locations (originals preserved at root).
76
+
77
+ ### What Was Tested
78
+ - `python test_all_envs.py` (via project venv): all three envs reset, embed text via `sentence-transformers/all-MiniLM-L6-v2`, render expected state descriptions, and report correct MRO chains; HumanImitationEnv successfully loads 211,278 states from `training/data/selfplay_states.json`.
79
+ - Verified new virtual environment `.venv` can import `numpy`, `sentence-transformers`, `diplomacy`, `openenv`, `torch`, `transformers`, `trl`, `datasets`, and `matplotlib`.
80
+ - Launched `python training/train_phase2.py` inside `.venv`; training begins from `grpo_output/checkpoint-200` with GRPOConfig (200 steps, learning rate 5e-6) and logs rewards for plotting.
81
+
82
+ ### Decisions Made
83
+ - Phase 2 environment is implemented as HumanImitationEnv over real Diplomacy states rather than duplicating ContractorNegotiationEnv logic to keep curriculum grounded in the 211,278-state dataset while preserving OpenEnv 0.2.1 compatibility.
84
+ - A dedicated project virtual environment `.venv` is used to avoid touching the system Python, per PEP 668 guidance, and all ML/RL dependencies are installed there.
85
+ - Phase 2 training continues directly from `grpo_output/checkpoint-200` using the directory path as the model identifier, matching Phase 1 and avoiding accidental reinitialization.
86
+
87
+ ### Blockers / Known Issues
88
+ - Phase 2 GRPO run may take ~1–2 hours on DGX/CPU; ensure logs are monitored and check that `training/checkpoints/phase2_final` and `training/phase2_reward_curve.png` are written successfully before claiming Phase 2 fully done in later sessions.
89
+ - `sentence-transformers/all-MiniLM-L6-v2` emits a harmless `embeddings.position_ids` UNEXPECTED load warning that can be safely ignored (architecture mismatch note only).
90
+
91
+ ### Files Modified
92
+ - `envs/human_imitation_env.py`
93
+ - `training/train_phase2.py`
94
+ - `test_all_envs.py`
95
+ - `training/data/selfplay_states.json` (copied into new folder; original preserved)
96
+ - `training/data/selfplay_states_test.json` (copied into new folder; original preserved)
97
+ - `training/checkpoints/reward_model.pt` (copied into new folder; original preserved)
98
+ - Project structure: `envs/`, `training/`, `training/data/`, `training/checkpoints/`, `agent/`, `simulation/`, `demo/`, `deploy/` created.
99
+
100
+ ### Next Session Entry Point
101
+ - **Session A2:** After Phase 2 training finishes and `training/checkpoints/phase2_final` exists, load the Phase 2 policy and start implementing `agent/arbitragent.py` and `agent/route_graph.py`. Use the three envs as black boxes and focus on the five-phase agent loop plus route graph scoring. Confirm that the agent can at least open and close one full route in a scripted scenario before adding bluff detection.
102
+
103
+ ## Session C1 — Seller Simulation — March 7 PM
104
+
105
+ **Status:** Complete
106
+
107
+ ### What Was Built
108
+ - `simulation/seller_profiles.py`: Defines 15+ listings, four seller archetypes (motivated, bluffer, ghoster, trade_curious), eight concrete seller profiles, `TRADE_TARGETS`, `RESPONSE_PROFILES`, and helpers `get_profile`/`get_profiles_by_archetype`.
109
+ - `simulation/seller_sim.py`: Implements `CraigslistSellerSim` with archetype-aware behavior, ghosting logic, hidden floors, and deterministic bluff injection for the critical `seller_bluffer_camera` profile.
110
+ - `simulation/scenario.py`: Provides `get_scenario()` that seeds RNG to 42 and returns the standard demo setup (motivated + bluffer camera + ghoster sellers plus trade targets) for deterministic 90-second runs.
111
+ - `test_seller_sim.py`: CLI harness that walks through scripted message sequences for all four archetypes, printing seller responses, current offers, and route-dead signals.
112
+
113
+ ### What Was Tested
114
+ - `python test_seller_sim.py` (inside `.venv`): confirmed motivated seller walks down toward floor when pushed, bluffer emits the exact canned bluff message at/after the configured trigger turn, ghoster intermittently fails to respond and can leave a route effectively dead, and trade-curious seller resists pure cash but engages on trade-related language.
115
+ - Multiple runs of `test_seller_sim.py` show stochastic but archetype-consistent patterns (e.g., ghosting frequency, trade-curious resistance, bluff message invariance).
116
+
117
+ ### Decisions Made
118
+ - Seller behavior is implemented as a lightweight rule-based simulator (`CraigslistSellerSim`) instead of calling an external LLM so that the demo remains fast, deterministic, and dependency-light while still exposing realistic bluff/ghost/trade dynamics.
119
+ - The `seller_bluffer_camera` profile is treated as the canonical demo inject, with explicit `bluff_message` and `bluff_trigger_turn` to align with the project pitch timeline.
120
+ - Deterministic seeding for the main scenario is handled in `simulation/scenario.py`, while individual seller sims retain stochasticity to keep repeated demos from feeling too scripted.
121
+
122
+ ### Blockers / Known Issues
123
+ - `CraigslistSellerSim` currently ignores any external LLM client; if a future session wires in TinyLlama responses, they should preserve the existing floor/ghost/bluff semantics and only swap out the natural-language surface.
124
+ - Route-dead status is surfaced via `is_dead()`/`status` but not yet consumed by the agent loop; Session A2/A3 should integrate these signals into route graph pruning and bluff detection.
125
+
126
+ ### Files Modified
127
+ - `simulation/seller_profiles.py`
128
+ - `simulation/seller_sim.py`
129
+ - `simulation/scenario.py`
130
+ - `test_seller_sim.py`
131
+
132
+ ### Next Session Entry Point
133
+ - **Session C2 (or A2/C1 follow-up):** Use `simulation/scenario.get_scenario()` inside the future `demo/run_demo.py` to spin up the standard three-seller + trade-target configuration, then plug the trained agent into these sims. Ensure the demo surfaces seller archetype behaviors (bluff, ghost, trade pivot) clearly in the terminal UI.
134
+
135
+
136
+ ## Session A1+B2 — Repo Structure + Phase 2 Setup — March 7 PM
137
+
138
+ **Status:** Complete
139
+
140
+ ### What Was Built
141
+ - `envs/human_imitation_env.py`: `HumanImitationEnv` (OpenEnv 0.2.1) that loads 211,278 real Diplomacy game states and encodes state text with `sentence-transformers/all-MiniLM-L6-v2` for Phase 2 human imitation training.
142
+ - `training/train_phase2.py`: GRPO Phase 2 training script that continues TinyLlama from `grpo_output/checkpoint-200` on human Diplomacy states, logs rewards, and saves Phase 2 checkpoint and reward curve.
143
+ - `test_all_envs.py`: Smoke test script that instantiates and renders `DiplomacyNegotiationEnv`, `ContractorNegotiationEnv`, and `HumanImitationEnv` and prints their MROs.
144
+ - Repository scaffolding: `envs/`, `training/`, `training/data/`, `training/checkpoints/`, `agent/`, `simulation/`, `demo/`, `deploy/` directories created and populated with existing artifacts (reward model and self-play data copied into `training/` subfolders).
145
+
146
+ ### What Was Tested
147
+ - `python test_all_envs.py` (via `venv`): All three environments reset and rendered successfully, printing realistic Diplomacy, contractor, and human imitation states; each reported correct MRO and printed `✅ ... OK` plus final lines:
148
+ - `All 3 environments passed smoke test.`
149
+ - `Ready for Phase 2 training.`
150
+ - `python training/train_phase2.py` (via `venv`, with `PYTHONPATH=.`): Confirmed that the script loads the Phase 1 checkpoint, loads 211,278 human game states, builds the GRPO dataset, and begins Phase 2 GRPO training (loading TinyLlama weights and starting iterations) without import errors.
151
+
152
+ ### Decisions Made
153
+ - Use `HumanImitationEnv` as a separate Phase 2 OpenEnv environment that directly leverages the 211,278 Diplomacy states for human imitation, while keeping `ContractorNegotiationEnv` intact for bluff-detection curriculum work.
154
+ - Load `sentence-transformers/all-MiniLM-L6-v2` inside each env instance for consistent 384-dim observation embeddings across Phase 1 and Phase 2 tasks.
155
+ - Drive Phase 2 GRPO training using text-based rewards that reward coalition language, aggression, defense, strategic reasoning markers, and bluff/pressure vocabulary, matching the Diplomacy + contractor bluff-detection thesis.
156
+ - Run training from the existing TinyLlama checkpoint path (`grpo_output/checkpoint-200`) rather than reinitializing, to preserve curriculum learning from Phase 1.
157
+
158
+ ### Blockers / Known Issues
159
+ - Phase 2 GRPO training is long-running and was started but not completed within this session; reward curve and final checkpoint will materialize as training progresses in `training/checkpoints/phase2_final` and `training/phase2_reward_curve.png`.
160
+ - HF Hub warnings appear due to missing `HF_TOKEN`; this only affects download rate, not correctness, but adding a token would speed up model downloads.
161
+
162
+ ### Files Modified
163
+ - `envs/human_imitation_env.py` (new)
164
+ - `training/train_phase2.py` (new)
165
+ - `test_all_envs.py` (new)
166
+ - `session_progress.md`
167
+ - Directory structure: `envs/`, `training/`, `training/data/`, `training/checkpoints/`, `agent/`, `simulation/`, `demo/`, `deploy/` created or confirmed; existing artifacts copied into `training/` subfolders.
168
+
169
+ ### Next Session Entry Point
170
+ - Verify that Phase 2 GRPO training on `training/train_phase2.py` has completed and that `training/checkpoints/phase2_final` and `training/phase2_reward_curve.png` exist; then evaluate the Phase 2 model vs the Phase 1 checkpoint on held-out states to confirm improved bluff/human-imitation behavior, and proceed to wiring this model into the ArbitrAgent loop and demo pipeline.
171
+
172
+ ## Session A2 — Agent Loop + Route Graph — March 7 PM
173
+
174
+ **Status:** Complete
175
+
176
+ ### What Was Built
177
+ - `agent/route_graph.py`: Implements `RouteGraph` and `RouteEdge`, a lightweight route graph with soft/confirmed/dead edges, per-route scoring using the project formula, threshold-based pruning, and helpers to update entry cost, exit value, confirmation probability, and seller reliability.
178
+ - `agent/arbitragent.py`: Implements `ArbitrAgent` with a five-phase loop (Scout, Route Mapping, Pressure & Confirm, Route Scoring, Execute) that uses `simulation.scenario.get_scenario()` and `RouteGraph` to run a full arbitrage episode end-to-end with mocked sellers.
179
+
180
+ ### What Was Tested
181
+ - `python3 -m agent.arbitragent`: runs the full 5-phase loop using the standard scenario; output shows three buy candidates scored in Phase 1, three routes constructed in Phase 2, deterministic bluff injection and ghosting behavior in Phase 3, scored and pruned routes in Phase 4, and execution of the highest-scoring confirmed route in Phase 5 with final value and profit printed.
182
+
183
+ ### Decisions Made
184
+ - Implemented a custom dict-based `RouteGraph` instead of adding NetworkX to keep dependencies minimal and make it easy to integrate into training and demo code.
185
+ - Treated seller simulations from `simulation/seller_sim.py` as the primary environment for Session A2, deferring integration of the GRPO-trained TinyLlama policy and OpenEnv environments to later sessions, while ensuring the agent loop shape (five phases) matches the project spec.
186
+ - Added simple, deterministic heuristics for scouting (resale demand + trade liquidity + bluff probability) and a stub bluff detector that looks for canonical "final offer" phrasing, so later sessions can swap in a learned model without changing the orchestration surface.
187
+
188
+ ### Blockers / Known Issues
189
+ - The current `ArbitrAgent` does not yet load or call a trained policy model; all decisions are heuristic and scripted for demo purposes.
190
+ - Bluff detection is intentionally lightweight and string-based; Session A3 should replace `_bluff_heuristic` with a proper signal extractor and eventually the trained curriculum model.
191
+ - The agent loop currently runs from `agent/arbitragent.py`; `demo/run_demo.py` and `demo/display.py` are still stubs and should be implemented to provide the final Rich terminal UI around this loop.
192
+
193
+ ### Files Modified
194
+ - `agent/route_graph.py` (new)
195
+ - `agent/arbitragent.py` (new)
196
+ - `session_progress.md`
197
+
198
+ ### Next Session Entry Point
199
+ - Wire the Phase 2 TinyLlama policy (once `training/checkpoints/phase2_final` exists) into `ArbitrAgent` so that message choices in each phase are generated by the trained model rather than fixed heuristics, and extend the bluff detection logic (or future `agent/bluff_detector.py`) to consume seller thread history and influence route confirmation probabilities within `RouteGraph`.
200
+
201
+ ## Session A3 — Bluff Detector — March 7 PM
202
+
203
+ **Status:** Complete
204
+
205
+ ### What Was Built
206
+ - `agent/bluff_detector.py`: Implements four bluff signals (`timing_tell`, `size_tell`, `formulaic_tell`, `pattern_tell`) plus a weighted `bluff_score` and boolean `is_bluff` flag, with a main `analyze_bluff` API and an `analyze_from_sim` helper for `CraigslistSellerSim`.
207
+ - `test_bluff_detector.py`: Small harness that drives the `seller_bluffer_camera` profile through a scripted negotiation to the canonical bluff message and prints/validates all four signals and the overall bluff flag.
208
+
209
+ ### What Was Tested
210
+ - `python test_bluff_detector.py` (inside `.venv`): For the `seller_bluffer_camera` profile, the scripted sequence reaches the bluff message `"look i really cant go lower than $30, thats my final offer. been getting a lot of interest so"`, and the detector reports `timing_tell = 1.0`, `size_tell = 1.0`, `formulaic_tell = 1.0`, `pattern_tell = 1.0`, `bluff_score = 1.0`, and `is_bluff = True`, with assertions confirming all four signals fire.
211
+
212
+ ### Decisions Made
213
+ - Bluff detection is implemented as deterministic heuristics over seller text and thread history: timing uses `response_speed` and turn index, size inspects round-number price concessions, formulaic checks for canned floor/“final offer” phrases, and pattern compares prior numeric-price concessions against a final formulaic message.
214
+ - The detector is deliberately lightweight and stateless, returning a `BluffSignals` dataclass so that future sessions can adjust weights or thresholds without changing call sites.
215
+
216
+ ### Blockers / Known Issues
217
+ - Bluff detection is not yet wired into `agent/arbitragent.py` or the route graph, so the agent currently does not act on the bluff signals (only the standalone harness uses them).
218
+
219
+ ### Files Modified
220
+ - `agent/bluff_detector.py` (new)
221
+ - `test_bluff_detector.py` (new)
222
+ - `session_progress.md`
223
+
224
+ ### Next Session Entry Point
225
+ - **Session A2/A3 follow-up:** Wire `agent/bluff_detector.analyze_bluff` into the main `arbitragent` loop and route-graph scoring, so that when a bluff is flagged (especially on the `seller_bluffer_camera` profile) the agent immediately deploys coalition pressure (e.g., referencing alternative trade routes) rather than accepting the stated floor at face value.
226
+
227
+ ---
228
+
229
+ ## Session — Unified ArbitrAgent Build — March 7, 2025
230
+
231
+ **Status:** Complete
232
+
233
+ ### What Was Built
234
+ - `envs/arbitragent_env.py`: ArbitrAgentEnv (OpenEnv 0.2.1) with three reward signals — accuracy (cosine sim to human action from selfplay states), outcome (keyword scoring: coalition/pressure/clean close vs premature concession), bluff (BluffDetector on synthetic seller message; reward correct flag, penalize missed formulaic tell). Loads `training/data/selfplay_states.json`, uses sentence-transformers/all-MiniLM-L6-v2. reset() samples random state; step(action) returns obs, total_reward, done, info with accuracy/outcome/bluff/total; render() includes last reward breakdown.
235
+ - `training/train_unified.py`: Loads Phase 2 checkpoint from `training/checkpoints/phase2_final`, runs GRPOTrainer on ArbitrAgentEnv (200 steps, lr 5e-6, batch 2), logs accuracy/outcome/bluff to unified_reward_log.json, saves to `training/checkpoints/unified_final/`, plots three-line reward curve to `training/unified_reward_curve.png`, prints final reward values.
236
+ - `agent/arbitragent.py`: BluffDetector wired in Phase 3 — after each seller response, analyze_from_sim; on is_bluff log full signals and deploy coalition pressure with floor − 4 (“can you do $[floor - 4]?”), bump route confirmation probability; on unverified floor claim (formulaic but not bluff) log "unverified_floor_claim". Structured log includes turn, seller_id, bluff_score, signals dict, action_taken.
237
+ - `demo/display.py`: Rich UI with Panel 1 — NEGOTIATION THREADS (seller, item, current offer, status; green/yellow/red/white); Panel 2 — LIVE EVENT LOG ([BLUFF DETECTED], [GOOD OUTCOME], [HUMAN-ALIGNED MOVE], [ROUTE KILLED]); Panel 3 — ROUTE GRAPH (route_id, entry, exit, score, status); Panel 4 — FINAL RESULT (Budget → Deployed → Final Value → Return, route and why).
238
+ - `demo/run_demo.py`: Entry point with budget (default 20), scenario (default "standard_demo"); resolves checkpoint (unified_final else phase2_final), runs get_scenario(), full 5-phase loop with display and event_log, coalition pressure on bluff (floor − 4), saves structured JSON to `demo/sample_run_log.json`; tuned for &lt;90s.
239
+ - `deploy/hf_spaces_app.py`: Single tab “ArbitrAgentEnv — Unified Negotiation Environment” (state, reward breakdown accuracy/outcome/bluff, action, submit/reset); second tab “Live Demo” with Run Demo button streaming run_demo output; try/except on env calls; launch(server_name="0.0.0.0", server_port=7860).
240
+ - `requirements.txt`: Updated with huggingface_hub, sentence-transformers, torch (CPU index), numpy, tqdm, rich, openenv, gradio, Diplomacy.
241
+ - `training/arbitragent_colab.ipynb`: Updated for unified env — Cell 3 ArbitrAgentEnv reset/render/reward breakdown; Cell 5 run 20 steps GRPO on ArbitrAgentEnv with three signals logged; Cell 6 plot unified reward curve (accuracy, outcome, bluff); Cell 7 bluff scenario inference + BluffDetector; Cell 8 side-by-side base TinyLlama (accepts $30) vs trained (bluff, coalition pressure, $24); markdown headers and summary for curriculum and reward rubric.
242
+
243
+ ### What Was Tested
244
+ - Unified training started in tmux session `unified`: `tmux send-keys -t unified "cd ~/Desktop/Play-gent && ... train_unified.py 2>&1 | tee training/unified_training.log"`. Training runs in background.
245
+ - Env and demo code paths verified by structure and imports; no simulation/ or agent/route_graph.py or agent/bluff_detector.py logic changed beyond specified wiring.
246
+
247
+ ### Decisions Made
248
+ - Coalition pressure uses stated floor − 4 per spec. Unverified floor claim logged when formulaic_tell &gt; 0 but not is_bluff.
249
+ - Demo display receives event_log list and threads with current_offer; Run Demo writes to demo/sample_run_log.json by default.
250
+ - HF Spaces runs run_demo via subprocess with PYTHONPATH and 90s timeout; errors shown in UI.
251
+
252
+ ### Blockers / Known Issues
253
+ - Unified training (~1 hr) runs in tmux; confirm `training/checkpoints/unified_final` and `training/unified_reward_curve.png` after completion.
254
+ - Colab cell 5 uses TinyLlama from hub (no phase2_final in Colab); optional to load from HF or local checkpoint if available.
255
+
256
+ ### Files Modified
257
+ - `envs/arbitragent_env.py` (new)
258
+ - `training/train_unified.py` (new)
259
+ - `agent/arbitragent.py`
260
+ - `demo/display.py`
261
+ - `demo/run_demo.py`
262
+ - `deploy/hf_spaces_app.py`
263
+ - `requirements.txt`
264
+ - `training/arbitragent_colab.ipynb`
265
+ - `session_progress.md`
266
+
267
+ ### Next Session Entry Point
268
+ - After unified training completes: load `training/checkpoints/unified_final` in demo/agent if desired; verify reward curve and final accuracy/outcome/bluff prints. Run `python demo/run_demo.py` and HF Spaces app end-to-end.
269
+
270
+ ---
271
+
272
+ ## Session — IRC Poker Bluff Classifier + Learned Detector — March 7, 2025
273
+
274
+ **Status:** Complete
275
+
276
+ ### What Was Built
277
+ - `training/parse_poker.py`: Parses all pdb files in `training/data/poker/IRCdata/holdem/199901/pdb/` (files named `pdb.*`). Labels each hand: BLUFF=True when preflop has 'r' or 'b', hand ends in fold (last non-dash action ends in 'f'), no cards at end; BLUFF=False for showdown or fold with no aggression. Text format: `Position {pos} of {num_players}. Preflop: ... Flop: ... Turn: ... River: ... Pot: {abs(bankroll_change)}.` Saves up to 50,000 examples to `training/data/poker/bluff_labels.json` as `[{"text": "...", "is_bluff": true/false}, ...]`. Prints total examples and class balance.
278
+ - `training/train_bluff_classifier.py`: DistilBERT binary classifier (768→2). Data from `bluff_labels.json`, 80/20 stratified split, 3 epochs, lr 2e-5, batch 32. Saves model to `training/checkpoints/bluff_classifier.pt`, tokenizer to `training/checkpoints/bluff_classifier_tokenizer/`. Prints val accuracy and F1 each epoch; must reach >65% val accuracy.
279
+ - `agent/bluff_detector.py`: Lazy-load of `bluff_classifier.pt` on first use. New `learned_bluff_score(message, thread_history)` converts message+thread to poker-style text and returns P(bluff) from classifier; returns 0.0 if checkpoint missing. Kept existing timing/size/formulaic/pattern as rule_score. New formula: `bluff_score = 0.6 * learned_bluff_score + 0.4 * rule_score` when classifier loaded; else `bluff_score = rule_score`. `analyze_bluff` and `analyze_from_sim` use new scoring; `is_bluff` threshold remains 0.6.
280
+ - `envs/arbitragent_env.py`: `_bluff_reward(action_lower)` now calls `analyze_bluff(SYNTHETIC_BLUFF_PROFILE, SYNTHETIC_THREAD, action_lower)` and returns `signals.bluff_score` as the bluff reward component (no other env changes).
281
+
282
+ ### What Was Tested
283
+ - `python training/parse_poker.py`: Parsed 50,000 examples (is_bluff=True 1339, is_bluff=False 48661), saved to `training/data/poker/bluff_labels.json`.
284
+ - Tmux session `bluff` started with `train_bluff_classifier.py` (runs ~20–30 min). Tmux session `unified` started with `train_unified.py` for optional restart after bluff finishes.
285
+
286
+ ### Decisions Made
287
+ - Pdb files are named `pdb.^`, `pdb.A2k`, etc.; parser uses `startswith("pdb.")` and lists directory instead of `*.pdb` glob.
288
+ - Bluff detector loads classifier inline (same architecture as `train_bluff_classifier.BluffClassifier`) to avoid circular imports; no import from `training` in agent at load time.
289
+ - Unified env uses action text as the message passed to `analyze_bluff` so the learned + rule score is the bluff reward.
290
+
291
+ ### Blockers / Known Issues
292
+ - Class balance is very skewed (≈2.7% bluff). Bluff classifier may need class weights or more epochs to reach >65% val accuracy; F1 on bluff class will be more informative.
293
+ - Run unified training after bluff classifier finishes so the env uses the new detector.
294
+
295
+ ### Files Modified
296
+ - `training/parse_poker.py` (new)
297
+ - `training/train_bluff_classifier.py` (new)
298
+ - `agent/bluff_detector.py`
299
+ - `envs/arbitragent_env.py`
300
+ - `session_progress.md`
301
+
302
+ ### Next Session Entry Point
303
+ - Check bluff training: `tmux attach -t bluff` (Ctrl+B then D to detach). After it finishes, confirm `training/checkpoints/bluff_classifier.pt` and `bluff_classifier_tokenizer/` exist; then run or re-run unified training in `tmux attach -t unified`.
304
+
305
+ ### Run Order (for reference)
306
+ 1. **Parse poker data:** `cd ~/Desktop/Play-gent && source .venv/bin/activate && PYTHONPATH=. python training/parse_poker.py`
307
+ 2. **Train bluff classifier (tmux, ~20–30 min):** `tmux new-session -d -s bluff` then `tmux send-keys -t bluff "cd ~/Desktop/Play-gent && source .venv/bin/activate && PYTHONPATH=. python training/train_bluff_classifier.py 2>&1 | tee training/bluff_training.log" Enter`
308
+ 3. **After bluff finishes, unified training:** `tmux new-session -d -s unified` then `tmux send-keys -t unified "cd ~/Desktop/Play-gent && source .venv/bin/activate && PYTHONPATH=. python training/train_unified.py 2>&1 | tee training/unified_training.log" Enter`
309
+
310
+ **Monitor tmux:** `tmux attach -t bluff` or `tmux attach -t unified` to watch; detach with Ctrl+B, D. List sessions: `tmux list-sessions`.
training/bluff_training.log DELETED
@@ -1,16 +0,0 @@
1
-
2
- DistilBertModel LOAD REPORT from: distilbert-base-uncased
3
- Key | Status | |
4
- ------------------------+------------+--+-
5
- vocab_transform.weight | UNEXPECTED | |
6
- vocab_projector.bias | UNEXPECTED | |
7
- vocab_transform.bias | UNEXPECTED | |
8
- vocab_layer_norm.bias | UNEXPECTED | |
9
- vocab_layer_norm.weight | UNEXPECTED | |
10
-
11
- Notes:
12
- - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
13
- Epoch 1/3 Val accuracy: 0.9999 Val F1: 0.9981
14
- Epoch 2/3 Val accuracy: 1.0000 Val F1: 1.0000
15
- Epoch 3/3 Val accuracy: 1.0000 Val F1: 1.0000
16
- Saved model to /home/rayyan/Desktop/Play-gent/training/checkpoints/bluff_classifier.pt, tokenizer to /home/rayyan/Desktop/Play-gent/training/checkpoints/bluff_classifier_tokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/bluff_classifier_tokenizer/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
training/checkpoints/bluff_classifier_tokenizer/tokenizer_config.json DELETED
@@ -1,14 +0,0 @@
1
- {
2
- "backend": "tokenizers",
3
- "cls_token": "[CLS]",
4
- "do_lower_case": true,
5
- "is_local": false,
6
- "mask_token": "[MASK]",
7
- "model_max_length": 512,
8
- "pad_token": "[PAD]",
9
- "sep_token": "[SEP]",
10
- "strip_accents": null,
11
- "tokenize_chinese_chars": true,
12
- "tokenizer_class": "BertTokenizer",
13
- "unk_token": "[UNK]"
14
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/README.md DELETED
@@ -1,67 +0,0 @@
1
- ---
2
- library_name: transformers
3
- model_name: phase2_final
4
- tags:
5
- - generated_from_trainer
6
- - trl
7
- - grpo
8
- licence: license
9
- ---
10
-
11
- # Model Card for phase2_final
12
-
13
- This model is a fine-tuned version of [None](https://huggingface.co/None).
14
- It has been trained using [TRL](https://github.com/huggingface/trl).
15
-
16
- ## Quick start
17
-
18
- ```python
19
- from transformers import pipeline
20
-
21
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
22
- generator = pipeline("text-generation", model="None", device="cuda")
23
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
24
- print(output["generated_text"])
25
- ```
26
-
27
- ## Training procedure
28
-
29
-
30
-
31
-
32
-
33
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
34
-
35
- ### Framework versions
36
-
37
- - TRL: 0.29.0
38
- - Transformers: 5.3.0
39
- - Pytorch: 2.12.0.dev20260307+cu128
40
- - Datasets: 4.6.1
41
- - Tokenizers: 0.22.2
42
-
43
- ## Citations
44
-
45
- Cite GRPO as:
46
-
47
- ```bibtex
48
- @article{shao2024deepseekmath,
49
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
50
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
51
- year = 2024,
52
- eprint = {arXiv:2402.03300},
53
- }
54
-
55
- ```
56
-
57
- Cite TRL as:
58
-
59
- ```bibtex
60
- @software{vonwerra2020trl,
61
- title = {{TRL: Transformers Reinforcement Learning}},
62
- author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
63
- license = {Apache-2.0},
64
- url = {https://github.com/huggingface/trl},
65
- year = {2020}
66
- }
67
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/chat_template.jinja DELETED
@@ -1,15 +0,0 @@
1
- {% for message in messages %}
2
- {% if message['role'] == 'user' %}
3
- {{ '<|user|>
4
- ' + message['content'] + eos_token }}
5
- {% elif message['role'] == 'system' %}
6
- {{ '<|system|>
7
- ' + message['content'] + eos_token }}
8
- {% elif message['role'] == 'assistant' %}
9
- {{ '<|assistant|>
10
- ' + message['content'] + eos_token }}
11
- {% endif %}
12
- {% if loop.last and add_generation_prompt %}
13
- {{ '<|assistant|>' }}
14
- {% endif %}
15
- {% endfor %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-100/chat_template.jinja DELETED
@@ -1,15 +0,0 @@
1
- {% for message in messages %}
2
- {% if message['role'] == 'user' %}
3
- {{ '<|user|>
4
- ' + message['content'] + eos_token }}
5
- {% elif message['role'] == 'system' %}
6
- {{ '<|system|>
7
- ' + message['content'] + eos_token }}
8
- {% elif message['role'] == 'assistant' %}
9
- {{ '<|assistant|>
10
- ' + message['content'] + eos_token }}
11
- {% endif %}
12
- {% if loop.last and add_generation_prompt %}
13
- {{ '<|assistant|>' }}
14
- {% endif %}
15
- {% endfor %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-100/config.json DELETED
@@ -1,32 +0,0 @@
1
- {
2
- "architectures": [
3
- "LlamaForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": 1,
8
- "dtype": "float32",
9
- "eos_token_id": 2,
10
- "head_dim": 64,
11
- "hidden_act": "silu",
12
- "hidden_size": 2048,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 5632,
15
- "max_position_embeddings": 2048,
16
- "mlp_bias": false,
17
- "model_type": "llama",
18
- "num_attention_heads": 32,
19
- "num_hidden_layers": 22,
20
- "num_key_value_heads": 4,
21
- "pad_token_id": 2,
22
- "pretraining_tp": 1,
23
- "rms_norm_eps": 1e-05,
24
- "rope_parameters": {
25
- "rope_theta": 10000.0,
26
- "rope_type": "default"
27
- },
28
- "tie_word_embeddings": false,
29
- "transformers_version": "5.3.0",
30
- "use_cache": false,
31
- "vocab_size": 32000
32
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-100/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 1,
3
- "eos_token_id": [
4
- 2
5
- ],
6
- "max_length": 2048,
7
- "pad_token_id": 2,
8
- "transformers_version": "5.3.0"
9
- }
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-100/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
training/checkpoints/phase2_final/checkpoint-100/tokenizer_config.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "add_prefix_space": null,
3
- "backend": "tokenizers",
4
- "bos_token": "<s>",
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "</s>",
7
- "is_local": true,
8
- "max_length": null,
9
- "model_max_length": 2048,
10
- "pad_to_multiple_of": null,
11
- "pad_token": "</s>",
12
- "pad_token_type_id": 0,
13
- "padding_side": "left",
14
- "sp_model_kwargs": {},
15
- "tokenizer_class": "LlamaTokenizer",
16
- "truncation_side": "left",
17
- "unk_token": "<unk>",
18
- "use_default_system_prompt": false
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-100/trainer_state.json DELETED
@@ -1,304 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 0.1,
6
- "eval_steps": 500,
7
- "global_step": 100,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "clip_ratio/high_max": 0.0,
14
- "clip_ratio/high_mean": 0.0,
15
- "clip_ratio/low_mean": 0.0,
16
- "clip_ratio/low_min": 0.0,
17
- "clip_ratio/region_mean": 0.0,
18
- "completions/clipped_ratio": 1.0,
19
- "completions/max_length": 100.0,
20
- "completions/max_terminated_length": 0.0,
21
- "completions/mean_length": 100.0,
22
- "completions/mean_terminated_length": 0.0,
23
- "completions/min_length": 100.0,
24
- "completions/min_terminated_length": 0.0,
25
- "entropy": 0.7924717187881469,
26
- "epoch": 0.01,
27
- "frac_reward_zero_std": 0.45,
28
- "grad_norm": 1.5374246835708618,
29
- "learning_rate": 4.775e-06,
30
- "loss": 1.4901161193847657e-09,
31
- "num_tokens": 35664.0,
32
- "reward": 0.11875000391155481,
33
- "reward_std": 0.09771842509508133,
34
- "rewards/compute_reward/mean": 0.11875000391155481,
35
- "rewards/compute_reward/std": 0.09771843403577804,
36
- "step": 10,
37
- "step_time": 15.109664801302278
38
- },
39
- {
40
- "clip_ratio/high_max": 0.0,
41
- "clip_ratio/high_mean": 0.0,
42
- "clip_ratio/low_mean": 0.0,
43
- "clip_ratio/low_min": 0.0,
44
- "clip_ratio/region_mean": 0.0,
45
- "completions/clipped_ratio": 1.0,
46
- "completions/max_length": 100.0,
47
- "completions/max_terminated_length": 0.0,
48
- "completions/mean_length": 100.0,
49
- "completions/mean_terminated_length": 0.0,
50
- "completions/min_length": 100.0,
51
- "completions/min_terminated_length": 0.0,
52
- "entropy": 0.8351163290441036,
53
- "epoch": 0.02,
54
- "frac_reward_zero_std": 0.65,
55
- "grad_norm": 0.0,
56
- "learning_rate": 4.525000000000001e-06,
57
- "loss": 2.6822090148925782e-08,
58
- "num_tokens": 70060.0,
59
- "reward": 0.15750000774860382,
60
- "reward_std": 0.04840061739087105,
61
- "rewards/compute_reward/mean": 0.15750000774860382,
62
- "rewards/compute_reward/std": 0.04840061739087105,
63
- "step": 20,
64
- "step_time": 14.928892047195404
65
- },
66
- {
67
- "clip_ratio/high_max": 0.0,
68
- "clip_ratio/high_mean": 0.0,
69
- "clip_ratio/low_mean": 0.0,
70
- "clip_ratio/low_min": 0.0,
71
- "clip_ratio/region_mean": 0.0,
72
- "completions/clipped_ratio": 1.0,
73
- "completions/max_length": 100.0,
74
- "completions/max_terminated_length": 0.0,
75
- "completions/mean_length": 100.0,
76
- "completions/mean_terminated_length": 0.0,
77
- "completions/min_length": 100.0,
78
- "completions/min_terminated_length": 0.0,
79
- "entropy": 0.41533662043511865,
80
- "epoch": 0.03,
81
- "frac_reward_zero_std": 0.8,
82
- "grad_norm": 0.0,
83
- "learning_rate": 4.2750000000000006e-06,
84
- "loss": 1.4901161193847657e-09,
85
- "num_tokens": 105588.0,
86
- "reward": 0.06375000178813935,
87
- "reward_std": 0.04330107718706131,
88
- "rewards/compute_reward/mean": 0.06375000178813935,
89
- "rewards/compute_reward/std": 0.04330108165740967,
90
- "step": 30,
91
- "step_time": 15.109792457801813
92
- },
93
- {
94
- "clip_ratio/high_max": 0.0,
95
- "clip_ratio/high_mean": 0.0,
96
- "clip_ratio/low_mean": 0.0,
97
- "clip_ratio/low_min": 0.0,
98
- "clip_ratio/region_mean": 0.0,
99
- "completions/clipped_ratio": 1.0,
100
- "completions/max_length": 100.0,
101
- "completions/max_terminated_length": 0.0,
102
- "completions/mean_length": 100.0,
103
- "completions/mean_terminated_length": 0.0,
104
- "completions/min_length": 100.0,
105
- "completions/min_terminated_length": 0.0,
106
- "entropy": 1.246315559744835,
107
- "epoch": 0.04,
108
- "frac_reward_zero_std": 1.0,
109
- "grad_norm": 0.0,
110
- "learning_rate": 4.0250000000000004e-06,
111
- "loss": 0.0,
112
- "num_tokens": 141264.0,
113
- "reward": 0.30000001192092896,
114
- "reward_std": 0.0,
115
- "rewards/compute_reward/mean": 0.30000001192092896,
116
- "rewards/compute_reward/std": 0.0,
117
- "step": 40,
118
- "step_time": 15.195196880902222
119
- },
120
- {
121
- "clip_ratio/high_max": 0.0,
122
- "clip_ratio/high_mean": 0.0,
123
- "clip_ratio/low_mean": 0.0,
124
- "clip_ratio/low_min": 0.0,
125
- "clip_ratio/region_mean": 0.0,
126
- "completions/clipped_ratio": 1.0,
127
- "completions/max_length": 100.0,
128
- "completions/max_terminated_length": 0.0,
129
- "completions/mean_length": 100.0,
130
- "completions/mean_terminated_length": 0.0,
131
- "completions/min_length": 100.0,
132
- "completions/min_terminated_length": 0.0,
133
- "entropy": 0.7081560462713241,
134
- "epoch": 0.05,
135
- "frac_reward_zero_std": 1.0,
136
- "grad_norm": 0.0,
137
- "learning_rate": 3.7750000000000003e-06,
138
- "loss": 0.0,
139
- "num_tokens": 176780.0,
140
- "reward": 0.30000001192092896,
141
- "reward_std": 0.0,
142
- "rewards/compute_reward/mean": 0.30000001192092896,
143
- "rewards/compute_reward/std": 0.0,
144
- "step": 50,
145
- "step_time": 15.140776808797819
146
- },
147
- {
148
- "clip_ratio/high_max": 0.0,
149
- "clip_ratio/high_mean": 0.0,
150
- "clip_ratio/low_mean": 0.0,
151
- "clip_ratio/low_min": 0.0,
152
- "clip_ratio/region_mean": 0.0,
153
- "completions/clipped_ratio": 1.0,
154
- "completions/max_length": 100.0,
155
- "completions/max_terminated_length": 0.0,
156
- "completions/mean_length": 100.0,
157
- "completions/mean_terminated_length": 0.0,
158
- "completions/min_length": 100.0,
159
- "completions/min_terminated_length": 0.0,
160
- "entropy": 0.727844113111496,
161
- "epoch": 0.06,
162
- "frac_reward_zero_std": 1.0,
163
- "grad_norm": 0.0,
164
- "learning_rate": 3.525e-06,
165
- "loss": 0.0,
166
- "num_tokens": 212628.0,
167
- "reward": 0.30000001192092896,
168
- "reward_std": 0.0,
169
- "rewards/compute_reward/mean": 0.30000001192092896,
170
- "rewards/compute_reward/std": 0.0,
171
- "step": 60,
172
- "step_time": 15.286061269601486
173
- },
174
- {
175
- "clip_ratio/high_max": 0.0,
176
- "clip_ratio/high_mean": 0.0,
177
- "clip_ratio/low_mean": 0.0,
178
- "clip_ratio/low_min": 0.0,
179
- "clip_ratio/region_mean": 0.0,
180
- "completions/clipped_ratio": 1.0,
181
- "completions/max_length": 100.0,
182
- "completions/max_terminated_length": 0.0,
183
- "completions/mean_length": 100.0,
184
- "completions/mean_terminated_length": 0.0,
185
- "completions/min_length": 100.0,
186
- "completions/min_terminated_length": 0.0,
187
- "entropy": 0.7312307402491569,
188
- "epoch": 0.07,
189
- "frac_reward_zero_std": 1.0,
190
- "grad_norm": 0.0,
191
- "learning_rate": 3.2750000000000004e-06,
192
- "loss": 0.0,
193
- "num_tokens": 248212.0,
194
- "reward": 0.30000001192092896,
195
- "reward_std": 0.0,
196
- "rewards/compute_reward/mean": 0.30000001192092896,
197
- "rewards/compute_reward/std": 0.0,
198
- "step": 70,
199
- "step_time": 15.278303197700733
200
- },
201
- {
202
- "clip_ratio/high_max": 0.0,
203
- "clip_ratio/high_mean": 0.0,
204
- "clip_ratio/low_mean": 0.0,
205
- "clip_ratio/low_min": 0.0,
206
- "clip_ratio/region_mean": 0.0,
207
- "completions/clipped_ratio": 1.0,
208
- "completions/max_length": 100.0,
209
- "completions/max_terminated_length": 0.0,
210
- "completions/mean_length": 100.0,
211
- "completions/mean_terminated_length": 0.0,
212
- "completions/min_length": 100.0,
213
- "completions/min_terminated_length": 0.0,
214
- "entropy": 0.7322262570261955,
215
- "epoch": 0.08,
216
- "frac_reward_zero_std": 1.0,
217
- "grad_norm": 0.0,
218
- "learning_rate": 3.0250000000000003e-06,
219
- "loss": 0.0,
220
- "num_tokens": 283644.0,
221
- "reward": 0.30000001192092896,
222
- "reward_std": 0.0,
223
- "rewards/compute_reward/mean": 0.30000001192092896,
224
- "rewards/compute_reward/std": 0.0,
225
- "step": 80,
226
- "step_time": 15.146252356799959
227
- },
228
- {
229
- "clip_ratio/high_max": 0.0,
230
- "clip_ratio/high_mean": 0.0,
231
- "clip_ratio/low_mean": 0.0,
232
- "clip_ratio/low_min": 0.0,
233
- "clip_ratio/region_mean": 0.0,
234
- "completions/clipped_ratio": 1.0,
235
- "completions/max_length": 100.0,
236
- "completions/max_terminated_length": 0.0,
237
- "completions/mean_length": 100.0,
238
- "completions/mean_terminated_length": 0.0,
239
- "completions/min_length": 100.0,
240
- "completions/min_terminated_length": 0.0,
241
- "entropy": 0.7361132100224494,
242
- "epoch": 0.09,
243
- "frac_reward_zero_std": 1.0,
244
- "grad_norm": 0.0,
245
- "learning_rate": 2.7750000000000005e-06,
246
- "loss": 0.0,
247
- "num_tokens": 318532.0,
248
- "reward": 0.30000001192092896,
249
- "reward_std": 0.0,
250
- "rewards/compute_reward/mean": 0.30000001192092896,
251
- "rewards/compute_reward/std": 0.0,
252
- "step": 90,
253
- "step_time": 15.026733554197563
254
- },
255
- {
256
- "clip_ratio/high_max": 0.0,
257
- "clip_ratio/high_mean": 0.0,
258
- "clip_ratio/low_mean": 0.0,
259
- "clip_ratio/low_min": 0.0,
260
- "clip_ratio/region_mean": 0.0,
261
- "completions/clipped_ratio": 1.0,
262
- "completions/max_length": 100.0,
263
- "completions/max_terminated_length": 0.0,
264
- "completions/mean_length": 100.0,
265
- "completions/mean_terminated_length": 0.0,
266
- "completions/min_length": 100.0,
267
- "completions/min_terminated_length": 0.0,
268
- "entropy": 0.7636664807796478,
269
- "epoch": 0.1,
270
- "frac_reward_zero_std": 1.0,
271
- "grad_norm": 0.0,
272
- "learning_rate": 2.5250000000000004e-06,
273
- "loss": 0.0,
274
- "num_tokens": 355352.0,
275
- "reward": 0.30000001192092896,
276
- "reward_std": 0.0,
277
- "rewards/compute_reward/mean": 0.30000001192092896,
278
- "rewards/compute_reward/std": 0.0,
279
- "step": 100,
280
- "step_time": 15.381215008600702
281
- }
282
- ],
283
- "logging_steps": 10,
284
- "max_steps": 200,
285
- "num_input_tokens_seen": 355352,
286
- "num_train_epochs": 1,
287
- "save_steps": 100,
288
- "stateful_callbacks": {
289
- "TrainerControl": {
290
- "args": {
291
- "should_epoch_stop": false,
292
- "should_evaluate": false,
293
- "should_log": false,
294
- "should_save": true,
295
- "should_training_stop": false
296
- },
297
- "attributes": {}
298
- }
299
- },
300
- "total_flos": 0.0,
301
- "train_batch_size": 2,
302
- "trial_name": null,
303
- "trial_params": null
304
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-200/chat_template.jinja DELETED
@@ -1,15 +0,0 @@
1
- {% for message in messages %}
2
- {% if message['role'] == 'user' %}
3
- {{ '<|user|>
4
- ' + message['content'] + eos_token }}
5
- {% elif message['role'] == 'system' %}
6
- {{ '<|system|>
7
- ' + message['content'] + eos_token }}
8
- {% elif message['role'] == 'assistant' %}
9
- {{ '<|assistant|>
10
- ' + message['content'] + eos_token }}
11
- {% endif %}
12
- {% if loop.last and add_generation_prompt %}
13
- {{ '<|assistant|>' }}
14
- {% endif %}
15
- {% endfor %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-200/config.json DELETED
@@ -1,32 +0,0 @@
1
- {
2
- "architectures": [
3
- "LlamaForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": 1,
8
- "dtype": "float32",
9
- "eos_token_id": 2,
10
- "head_dim": 64,
11
- "hidden_act": "silu",
12
- "hidden_size": 2048,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 5632,
15
- "max_position_embeddings": 2048,
16
- "mlp_bias": false,
17
- "model_type": "llama",
18
- "num_attention_heads": 32,
19
- "num_hidden_layers": 22,
20
- "num_key_value_heads": 4,
21
- "pad_token_id": 2,
22
- "pretraining_tp": 1,
23
- "rms_norm_eps": 1e-05,
24
- "rope_parameters": {
25
- "rope_theta": 10000.0,
26
- "rope_type": "default"
27
- },
28
- "tie_word_embeddings": false,
29
- "transformers_version": "5.3.0",
30
- "use_cache": false,
31
- "vocab_size": 32000
32
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-200/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 1,
3
- "eos_token_id": [
4
- 2
5
- ],
6
- "max_length": 2048,
7
- "pad_token_id": 2,
8
- "transformers_version": "5.3.0"
9
- }
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-200/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
training/checkpoints/phase2_final/checkpoint-200/tokenizer_config.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "add_prefix_space": null,
3
- "backend": "tokenizers",
4
- "bos_token": "<s>",
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "</s>",
7
- "is_local": true,
8
- "max_length": null,
9
- "model_max_length": 2048,
10
- "pad_to_multiple_of": null,
11
- "pad_token": "</s>",
12
- "pad_token_type_id": 0,
13
- "padding_side": "left",
14
- "sp_model_kwargs": {},
15
- "tokenizer_class": "LlamaTokenizer",
16
- "truncation_side": "left",
17
- "unk_token": "<unk>",
18
- "use_default_system_prompt": false
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/checkpoint-200/trainer_state.json DELETED
@@ -1,574 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 0.2,
6
- "eval_steps": 500,
7
- "global_step": 200,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "clip_ratio/high_max": 0.0,
14
- "clip_ratio/high_mean": 0.0,
15
- "clip_ratio/low_mean": 0.0,
16
- "clip_ratio/low_min": 0.0,
17
- "clip_ratio/region_mean": 0.0,
18
- "completions/clipped_ratio": 1.0,
19
- "completions/max_length": 100.0,
20
- "completions/max_terminated_length": 0.0,
21
- "completions/mean_length": 100.0,
22
- "completions/mean_terminated_length": 0.0,
23
- "completions/min_length": 100.0,
24
- "completions/min_terminated_length": 0.0,
25
- "entropy": 0.7924717187881469,
26
- "epoch": 0.01,
27
- "frac_reward_zero_std": 0.45,
28
- "grad_norm": 1.5374246835708618,
29
- "learning_rate": 4.775e-06,
30
- "loss": 1.4901161193847657e-09,
31
- "num_tokens": 35664.0,
32
- "reward": 0.11875000391155481,
33
- "reward_std": 0.09771842509508133,
34
- "rewards/compute_reward/mean": 0.11875000391155481,
35
- "rewards/compute_reward/std": 0.09771843403577804,
36
- "step": 10,
37
- "step_time": 15.109664801302278
38
- },
39
- {
40
- "clip_ratio/high_max": 0.0,
41
- "clip_ratio/high_mean": 0.0,
42
- "clip_ratio/low_mean": 0.0,
43
- "clip_ratio/low_min": 0.0,
44
- "clip_ratio/region_mean": 0.0,
45
- "completions/clipped_ratio": 1.0,
46
- "completions/max_length": 100.0,
47
- "completions/max_terminated_length": 0.0,
48
- "completions/mean_length": 100.0,
49
- "completions/mean_terminated_length": 0.0,
50
- "completions/min_length": 100.0,
51
- "completions/min_terminated_length": 0.0,
52
- "entropy": 0.8351163290441036,
53
- "epoch": 0.02,
54
- "frac_reward_zero_std": 0.65,
55
- "grad_norm": 0.0,
56
- "learning_rate": 4.525000000000001e-06,
57
- "loss": 2.6822090148925782e-08,
58
- "num_tokens": 70060.0,
59
- "reward": 0.15750000774860382,
60
- "reward_std": 0.04840061739087105,
61
- "rewards/compute_reward/mean": 0.15750000774860382,
62
- "rewards/compute_reward/std": 0.04840061739087105,
63
- "step": 20,
64
- "step_time": 14.928892047195404
65
- },
66
- {
67
- "clip_ratio/high_max": 0.0,
68
- "clip_ratio/high_mean": 0.0,
69
- "clip_ratio/low_mean": 0.0,
70
- "clip_ratio/low_min": 0.0,
71
- "clip_ratio/region_mean": 0.0,
72
- "completions/clipped_ratio": 1.0,
73
- "completions/max_length": 100.0,
74
- "completions/max_terminated_length": 0.0,
75
- "completions/mean_length": 100.0,
76
- "completions/mean_terminated_length": 0.0,
77
- "completions/min_length": 100.0,
78
- "completions/min_terminated_length": 0.0,
79
- "entropy": 0.41533662043511865,
80
- "epoch": 0.03,
81
- "frac_reward_zero_std": 0.8,
82
- "grad_norm": 0.0,
83
- "learning_rate": 4.2750000000000006e-06,
84
- "loss": 1.4901161193847657e-09,
85
- "num_tokens": 105588.0,
86
- "reward": 0.06375000178813935,
87
- "reward_std": 0.04330107718706131,
88
- "rewards/compute_reward/mean": 0.06375000178813935,
89
- "rewards/compute_reward/std": 0.04330108165740967,
90
- "step": 30,
91
- "step_time": 15.109792457801813
92
- },
93
- {
94
- "clip_ratio/high_max": 0.0,
95
- "clip_ratio/high_mean": 0.0,
96
- "clip_ratio/low_mean": 0.0,
97
- "clip_ratio/low_min": 0.0,
98
- "clip_ratio/region_mean": 0.0,
99
- "completions/clipped_ratio": 1.0,
100
- "completions/max_length": 100.0,
101
- "completions/max_terminated_length": 0.0,
102
- "completions/mean_length": 100.0,
103
- "completions/mean_terminated_length": 0.0,
104
- "completions/min_length": 100.0,
105
- "completions/min_terminated_length": 0.0,
106
- "entropy": 1.246315559744835,
107
- "epoch": 0.04,
108
- "frac_reward_zero_std": 1.0,
109
- "grad_norm": 0.0,
110
- "learning_rate": 4.0250000000000004e-06,
111
- "loss": 0.0,
112
- "num_tokens": 141264.0,
113
- "reward": 0.30000001192092896,
114
- "reward_std": 0.0,
115
- "rewards/compute_reward/mean": 0.30000001192092896,
116
- "rewards/compute_reward/std": 0.0,
117
- "step": 40,
118
- "step_time": 15.195196880902222
119
- },
120
- {
121
- "clip_ratio/high_max": 0.0,
122
- "clip_ratio/high_mean": 0.0,
123
- "clip_ratio/low_mean": 0.0,
124
- "clip_ratio/low_min": 0.0,
125
- "clip_ratio/region_mean": 0.0,
126
- "completions/clipped_ratio": 1.0,
127
- "completions/max_length": 100.0,
128
- "completions/max_terminated_length": 0.0,
129
- "completions/mean_length": 100.0,
130
- "completions/mean_terminated_length": 0.0,
131
- "completions/min_length": 100.0,
132
- "completions/min_terminated_length": 0.0,
133
- "entropy": 0.7081560462713241,
134
- "epoch": 0.05,
135
- "frac_reward_zero_std": 1.0,
136
- "grad_norm": 0.0,
137
- "learning_rate": 3.7750000000000003e-06,
138
- "loss": 0.0,
139
- "num_tokens": 176780.0,
140
- "reward": 0.30000001192092896,
141
- "reward_std": 0.0,
142
- "rewards/compute_reward/mean": 0.30000001192092896,
143
- "rewards/compute_reward/std": 0.0,
144
- "step": 50,
145
- "step_time": 15.140776808797819
146
- },
147
- {
148
- "clip_ratio/high_max": 0.0,
149
- "clip_ratio/high_mean": 0.0,
150
- "clip_ratio/low_mean": 0.0,
151
- "clip_ratio/low_min": 0.0,
152
- "clip_ratio/region_mean": 0.0,
153
- "completions/clipped_ratio": 1.0,
154
- "completions/max_length": 100.0,
155
- "completions/max_terminated_length": 0.0,
156
- "completions/mean_length": 100.0,
157
- "completions/mean_terminated_length": 0.0,
158
- "completions/min_length": 100.0,
159
- "completions/min_terminated_length": 0.0,
160
- "entropy": 0.727844113111496,
161
- "epoch": 0.06,
162
- "frac_reward_zero_std": 1.0,
163
- "grad_norm": 0.0,
164
- "learning_rate": 3.525e-06,
165
- "loss": 0.0,
166
- "num_tokens": 212628.0,
167
- "reward": 0.30000001192092896,
168
- "reward_std": 0.0,
169
- "rewards/compute_reward/mean": 0.30000001192092896,
170
- "rewards/compute_reward/std": 0.0,
171
- "step": 60,
172
- "step_time": 15.286061269601486
173
- },
174
- {
175
- "clip_ratio/high_max": 0.0,
176
- "clip_ratio/high_mean": 0.0,
177
- "clip_ratio/low_mean": 0.0,
178
- "clip_ratio/low_min": 0.0,
179
- "clip_ratio/region_mean": 0.0,
180
- "completions/clipped_ratio": 1.0,
181
- "completions/max_length": 100.0,
182
- "completions/max_terminated_length": 0.0,
183
- "completions/mean_length": 100.0,
184
- "completions/mean_terminated_length": 0.0,
185
- "completions/min_length": 100.0,
186
- "completions/min_terminated_length": 0.0,
187
- "entropy": 0.7312307402491569,
188
- "epoch": 0.07,
189
- "frac_reward_zero_std": 1.0,
190
- "grad_norm": 0.0,
191
- "learning_rate": 3.2750000000000004e-06,
192
- "loss": 0.0,
193
- "num_tokens": 248212.0,
194
- "reward": 0.30000001192092896,
195
- "reward_std": 0.0,
196
- "rewards/compute_reward/mean": 0.30000001192092896,
197
- "rewards/compute_reward/std": 0.0,
198
- "step": 70,
199
- "step_time": 15.278303197700733
200
- },
201
- {
202
- "clip_ratio/high_max": 0.0,
203
- "clip_ratio/high_mean": 0.0,
204
- "clip_ratio/low_mean": 0.0,
205
- "clip_ratio/low_min": 0.0,
206
- "clip_ratio/region_mean": 0.0,
207
- "completions/clipped_ratio": 1.0,
208
- "completions/max_length": 100.0,
209
- "completions/max_terminated_length": 0.0,
210
- "completions/mean_length": 100.0,
211
- "completions/mean_terminated_length": 0.0,
212
- "completions/min_length": 100.0,
213
- "completions/min_terminated_length": 0.0,
214
- "entropy": 0.7322262570261955,
215
- "epoch": 0.08,
216
- "frac_reward_zero_std": 1.0,
217
- "grad_norm": 0.0,
218
- "learning_rate": 3.0250000000000003e-06,
219
- "loss": 0.0,
220
- "num_tokens": 283644.0,
221
- "reward": 0.30000001192092896,
222
- "reward_std": 0.0,
223
- "rewards/compute_reward/mean": 0.30000001192092896,
224
- "rewards/compute_reward/std": 0.0,
225
- "step": 80,
226
- "step_time": 15.146252356799959
227
- },
228
- {
229
- "clip_ratio/high_max": 0.0,
230
- "clip_ratio/high_mean": 0.0,
231
- "clip_ratio/low_mean": 0.0,
232
- "clip_ratio/low_min": 0.0,
233
- "clip_ratio/region_mean": 0.0,
234
- "completions/clipped_ratio": 1.0,
235
- "completions/max_length": 100.0,
236
- "completions/max_terminated_length": 0.0,
237
- "completions/mean_length": 100.0,
238
- "completions/mean_terminated_length": 0.0,
239
- "completions/min_length": 100.0,
240
- "completions/min_terminated_length": 0.0,
241
- "entropy": 0.7361132100224494,
242
- "epoch": 0.09,
243
- "frac_reward_zero_std": 1.0,
244
- "grad_norm": 0.0,
245
- "learning_rate": 2.7750000000000005e-06,
246
- "loss": 0.0,
247
- "num_tokens": 318532.0,
248
- "reward": 0.30000001192092896,
249
- "reward_std": 0.0,
250
- "rewards/compute_reward/mean": 0.30000001192092896,
251
- "rewards/compute_reward/std": 0.0,
252
- "step": 90,
253
- "step_time": 15.026733554197563
254
- },
255
- {
256
- "clip_ratio/high_max": 0.0,
257
- "clip_ratio/high_mean": 0.0,
258
- "clip_ratio/low_mean": 0.0,
259
- "clip_ratio/low_min": 0.0,
260
- "clip_ratio/region_mean": 0.0,
261
- "completions/clipped_ratio": 1.0,
262
- "completions/max_length": 100.0,
263
- "completions/max_terminated_length": 0.0,
264
- "completions/mean_length": 100.0,
265
- "completions/mean_terminated_length": 0.0,
266
- "completions/min_length": 100.0,
267
- "completions/min_terminated_length": 0.0,
268
- "entropy": 0.7636664807796478,
269
- "epoch": 0.1,
270
- "frac_reward_zero_std": 1.0,
271
- "grad_norm": 0.0,
272
- "learning_rate": 2.5250000000000004e-06,
273
- "loss": 0.0,
274
- "num_tokens": 355352.0,
275
- "reward": 0.30000001192092896,
276
- "reward_std": 0.0,
277
- "rewards/compute_reward/mean": 0.30000001192092896,
278
- "rewards/compute_reward/std": 0.0,
279
- "step": 100,
280
- "step_time": 15.381215008600702
281
- },
282
- {
283
- "clip_ratio/high_max": 0.0,
284
- "clip_ratio/high_mean": 0.0,
285
- "clip_ratio/low_mean": 0.0,
286
- "clip_ratio/low_min": 0.0,
287
- "clip_ratio/region_mean": 0.0,
288
- "completions/clipped_ratio": 1.0,
289
- "completions/max_length": 100.0,
290
- "completions/max_terminated_length": 0.0,
291
- "completions/mean_length": 100.0,
292
- "completions/mean_terminated_length": 0.0,
293
- "completions/min_length": 100.0,
294
- "completions/min_terminated_length": 0.0,
295
- "entropy": 0.7429351836442948,
296
- "epoch": 0.11,
297
- "frac_reward_zero_std": 1.0,
298
- "grad_norm": 0.0,
299
- "learning_rate": 2.2750000000000002e-06,
300
- "loss": 0.0,
301
- "num_tokens": 389508.0,
302
- "reward": 0.30000001192092896,
303
- "reward_std": 0.0,
304
- "rewards/compute_reward/mean": 0.30000001192092896,
305
- "rewards/compute_reward/std": 0.0,
306
- "step": 110,
307
- "step_time": 15.039604106301704
308
- },
309
- {
310
- "clip_ratio/high_max": 0.0,
311
- "clip_ratio/high_mean": 0.0,
312
- "clip_ratio/low_mean": 0.0,
313
- "clip_ratio/low_min": 0.0,
314
- "clip_ratio/region_mean": 0.0,
315
- "completions/clipped_ratio": 1.0,
316
- "completions/max_length": 100.0,
317
- "completions/max_terminated_length": 0.0,
318
- "completions/mean_length": 100.0,
319
- "completions/mean_terminated_length": 0.0,
320
- "completions/min_length": 100.0,
321
- "completions/min_terminated_length": 0.0,
322
- "entropy": 0.7703481003642082,
323
- "epoch": 0.12,
324
- "frac_reward_zero_std": 1.0,
325
- "grad_norm": 0.0,
326
- "learning_rate": 2.025e-06,
327
- "loss": 0.0,
328
- "num_tokens": 426240.0,
329
- "reward": 0.30000001192092896,
330
- "reward_std": 0.0,
331
- "rewards/compute_reward/mean": 0.30000001192092896,
332
- "rewards/compute_reward/std": 0.0,
333
- "step": 120,
334
- "step_time": 15.29271342299835
335
- },
336
- {
337
- "clip_ratio/high_max": 0.0,
338
- "clip_ratio/high_mean": 0.0,
339
- "clip_ratio/low_mean": 0.0,
340
- "clip_ratio/low_min": 0.0,
341
- "clip_ratio/region_mean": 0.0,
342
- "completions/clipped_ratio": 1.0,
343
- "completions/max_length": 100.0,
344
- "completions/max_terminated_length": 0.0,
345
- "completions/mean_length": 100.0,
346
- "completions/mean_terminated_length": 0.0,
347
- "completions/min_length": 100.0,
348
- "completions/min_terminated_length": 0.0,
349
- "entropy": 0.7375139251351357,
350
- "epoch": 0.13,
351
- "frac_reward_zero_std": 1.0,
352
- "grad_norm": 0.0,
353
- "learning_rate": 1.7750000000000002e-06,
354
- "loss": 0.0,
355
- "num_tokens": 462400.0,
356
- "reward": 0.30000001192092896,
357
- "reward_std": 0.0,
358
- "rewards/compute_reward/mean": 0.30000001192092896,
359
- "rewards/compute_reward/std": 0.0,
360
- "step": 130,
361
- "step_time": 15.20639470120077
362
- },
363
- {
364
- "clip_ratio/high_max": 0.0,
365
- "clip_ratio/high_mean": 0.0,
366
- "clip_ratio/low_mean": 0.0,
367
- "clip_ratio/low_min": 0.0,
368
- "clip_ratio/region_mean": 0.0,
369
- "completions/clipped_ratio": 1.0,
370
- "completions/max_length": 100.0,
371
- "completions/max_terminated_length": 0.0,
372
- "completions/mean_length": 100.0,
373
- "completions/mean_terminated_length": 0.0,
374
- "completions/min_length": 100.0,
375
- "completions/min_terminated_length": 0.0,
376
- "entropy": 0.8568216070532799,
377
- "epoch": 0.14,
378
- "frac_reward_zero_std": 0.9,
379
- "grad_norm": 0.0,
380
- "learning_rate": 1.525e-06,
381
- "loss": 1.7881393432617187e-08,
382
- "num_tokens": 498020.0,
383
- "reward": 0.30500001311302183,
384
- "reward_std": 0.01414213478565216,
385
- "rewards/compute_reward/mean": 0.30500001311302183,
386
- "rewards/compute_reward/std": 0.01414213478565216,
387
- "step": 140,
388
- "step_time": 15.200056954801402
389
- },
390
- {
391
- "clip_ratio/high_max": 0.0,
392
- "clip_ratio/high_mean": 0.0,
393
- "clip_ratio/low_mean": 0.0,
394
- "clip_ratio/low_min": 0.0,
395
- "clip_ratio/region_mean": 0.0,
396
- "completions/clipped_ratio": 1.0,
397
- "completions/max_length": 100.0,
398
- "completions/max_terminated_length": 0.0,
399
- "completions/mean_length": 100.0,
400
- "completions/mean_terminated_length": 0.0,
401
- "completions/min_length": 100.0,
402
- "completions/min_terminated_length": 0.0,
403
- "entropy": 1.4760520339012146,
404
- "epoch": 0.15,
405
- "frac_reward_zero_std": 0.95,
406
- "grad_norm": 0.0,
407
- "learning_rate": 1.275e-06,
408
- "loss": 8.940696716308593e-09,
409
- "num_tokens": 532668.0,
410
- "reward": 0.3025000125169754,
411
- "reward_std": 0.00707106739282608,
412
- "rewards/compute_reward/mean": 0.3025000125169754,
413
- "rewards/compute_reward/std": 0.00707106739282608,
414
- "step": 150,
415
- "step_time": 14.748404727898015
416
- },
417
- {
418
- "clip_ratio/high_max": 0.0,
419
- "clip_ratio/high_mean": 0.0,
420
- "clip_ratio/low_mean": 0.0,
421
- "clip_ratio/low_min": 0.0,
422
- "clip_ratio/region_mean": 0.0,
423
- "completions/clipped_ratio": 1.0,
424
- "completions/max_length": 100.0,
425
- "completions/max_terminated_length": 0.0,
426
- "completions/mean_length": 100.0,
427
- "completions/mean_terminated_length": 0.0,
428
- "completions/min_length": 100.0,
429
- "completions/min_terminated_length": 0.0,
430
- "entropy": 1.7379814833402634,
431
- "epoch": 0.16,
432
- "frac_reward_zero_std": 0.9,
433
- "grad_norm": 0.0,
434
- "learning_rate": 1.025e-06,
435
- "loss": 1.564621925354004e-08,
436
- "num_tokens": 567544.0,
437
- "reward": 0.3037500113248825,
438
- "reward_std": 0.01060660146176815,
439
- "rewards/compute_reward/mean": 0.3037500113248825,
440
- "rewards/compute_reward/std": 0.01060660108923912,
441
- "step": 160,
442
- "step_time": 15.037257523898734
443
- },
444
- {
445
- "clip_ratio/high_max": 0.0,
446
- "clip_ratio/high_mean": 0.0,
447
- "clip_ratio/low_mean": 0.0,
448
- "clip_ratio/low_min": 0.0,
449
- "clip_ratio/region_mean": 0.0,
450
- "completions/clipped_ratio": 1.0,
451
- "completions/max_length": 100.0,
452
- "completions/max_terminated_length": 0.0,
453
- "completions/mean_length": 100.0,
454
- "completions/mean_terminated_length": 0.0,
455
- "completions/min_length": 100.0,
456
- "completions/min_terminated_length": 0.0,
457
- "entropy": 1.5534777998924256,
458
- "epoch": 0.17,
459
- "frac_reward_zero_std": 0.8,
460
- "grad_norm": 0.0,
461
- "learning_rate": 7.750000000000001e-07,
462
- "loss": 1.7881393432617187e-08,
463
- "num_tokens": 604400.0,
464
- "reward": 0.31500001549720763,
465
- "reward_std": 0.032658536732196805,
466
- "rewards/compute_reward/mean": 0.31500001549720763,
467
- "rewards/compute_reward/std": 0.032658536732196805,
468
- "step": 170,
469
- "step_time": 15.339705387198773
470
- },
471
- {
472
- "clip_ratio/high_max": 0.0,
473
- "clip_ratio/high_mean": 0.0,
474
- "clip_ratio/low_mean": 0.0,
475
- "clip_ratio/low_min": 0.0,
476
- "clip_ratio/region_mean": 0.0,
477
- "completions/clipped_ratio": 1.0,
478
- "completions/max_length": 100.0,
479
- "completions/max_terminated_length": 0.0,
480
- "completions/mean_length": 100.0,
481
- "completions/mean_terminated_length": 0.0,
482
- "completions/min_length": 100.0,
483
- "completions/min_terminated_length": 0.0,
484
- "entropy": 1.3570319384336471,
485
- "epoch": 0.18,
486
- "frac_reward_zero_std": 0.9,
487
- "grad_norm": 2.3024227619171143,
488
- "learning_rate": 5.250000000000001e-07,
489
- "loss": 8.195638656616212e-09,
490
- "num_tokens": 639432.0,
491
- "reward": 0.3075000137090683,
492
- "reward_std": 0.02121320217847824,
493
- "rewards/compute_reward/mean": 0.3075000137090683,
494
- "rewards/compute_reward/std": 0.02121320217847824,
495
- "step": 180,
496
- "step_time": 14.838772397398861
497
- },
498
- {
499
- "clip_ratio/high_max": 0.0,
500
- "clip_ratio/high_mean": 0.0,
501
- "clip_ratio/low_mean": 0.0,
502
- "clip_ratio/low_min": 0.0,
503
- "clip_ratio/region_mean": 0.0,
504
- "completions/clipped_ratio": 1.0,
505
- "completions/max_length": 100.0,
506
- "completions/max_terminated_length": 0.0,
507
- "completions/mean_length": 100.0,
508
- "completions/mean_terminated_length": 0.0,
509
- "completions/min_length": 100.0,
510
- "completions/min_terminated_length": 0.0,
511
- "entropy": 1.4456530869007111,
512
- "epoch": 0.19,
513
- "frac_reward_zero_std": 0.75,
514
- "grad_norm": 1.3511810302734375,
515
- "learning_rate": 2.75e-07,
516
- "loss": 4.0978193283081055e-08,
517
- "num_tokens": 674972.0,
518
- "reward": 0.31125001311302186,
519
- "reward_std": 0.03181980364024639,
520
- "rewards/compute_reward/mean": 0.31125001311302186,
521
- "rewards/compute_reward/std": 0.03181980326771736,
522
- "step": 190,
523
- "step_time": 15.081224197598932
524
- },
525
- {
526
- "clip_ratio/high_max": 0.0,
527
- "clip_ratio/high_mean": 0.0,
528
- "clip_ratio/low_mean": 0.0,
529
- "clip_ratio/low_min": 0.0,
530
- "clip_ratio/region_mean": 0.0,
531
- "completions/clipped_ratio": 1.0,
532
- "completions/max_length": 100.0,
533
- "completions/max_terminated_length": 0.0,
534
- "completions/mean_length": 100.0,
535
- "completions/mean_terminated_length": 0.0,
536
- "completions/min_length": 100.0,
537
- "completions/min_terminated_length": 0.0,
538
- "entropy": 1.5674545228481294,
539
- "epoch": 0.2,
540
- "frac_reward_zero_std": 0.75,
541
- "grad_norm": 1.818772792816162,
542
- "learning_rate": 2.5000000000000002e-08,
543
- "loss": 3.129243850708008e-08,
544
- "num_tokens": 709480.0,
545
- "reward": 0.31750001311302184,
546
- "reward_std": 0.04316474497318268,
547
- "rewards/compute_reward/mean": 0.31750001311302184,
548
- "rewards/compute_reward/std": 0.04316474497318268,
549
- "step": 200,
550
- "step_time": 15.07085579989798
551
- }
552
- ],
553
- "logging_steps": 10,
554
- "max_steps": 200,
555
- "num_input_tokens_seen": 709480,
556
- "num_train_epochs": 1,
557
- "save_steps": 100,
558
- "stateful_callbacks": {
559
- "TrainerControl": {
560
- "args": {
561
- "should_epoch_stop": false,
562
- "should_evaluate": false,
563
- "should_log": false,
564
- "should_save": true,
565
- "should_training_stop": true
566
- },
567
- "attributes": {}
568
- }
569
- },
570
- "total_flos": 0.0,
571
- "train_batch_size": 2,
572
- "trial_name": null,
573
- "trial_params": null
574
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/config.json DELETED
@@ -1,32 +0,0 @@
1
- {
2
- "architectures": [
3
- "LlamaForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": 1,
8
- "dtype": "float32",
9
- "eos_token_id": 2,
10
- "head_dim": 64,
11
- "hidden_act": "silu",
12
- "hidden_size": 2048,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 5632,
15
- "max_position_embeddings": 2048,
16
- "mlp_bias": false,
17
- "model_type": "llama",
18
- "num_attention_heads": 32,
19
- "num_hidden_layers": 22,
20
- "num_key_value_heads": 4,
21
- "pad_token_id": 2,
22
- "pretraining_tp": 1,
23
- "rms_norm_eps": 1e-05,
24
- "rope_parameters": {
25
- "rope_theta": 10000.0,
26
- "rope_type": "default"
27
- },
28
- "tie_word_embeddings": false,
29
- "transformers_version": "5.3.0",
30
- "use_cache": false,
31
- "vocab_size": 32000
32
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 1,
3
- "eos_token_id": [
4
- 2
5
- ],
6
- "max_length": 2048,
7
- "pad_token_id": 2,
8
- "transformers_version": "5.3.0"
9
- }
 
 
 
 
 
 
 
 
 
 
training/checkpoints/phase2_final/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
training/checkpoints/phase2_final/tokenizer_config.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "add_prefix_space": null,
3
- "backend": "tokenizers",
4
- "bos_token": "<s>",
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "</s>",
7
- "is_local": true,
8
- "max_length": null,
9
- "model_max_length": 2048,
10
- "pad_to_multiple_of": null,
11
- "pad_token": "</s>",
12
- "pad_token_type_id": 0,
13
- "padding_side": "left",
14
- "sp_model_kwargs": {},
15
- "tokenizer_class": "LlamaTokenizer",
16
- "truncation_side": "left",
17
- "unk_token": "<unk>",
18
- "use_default_system_prompt": false
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/README.md DELETED
@@ -1,67 +0,0 @@
1
- ---
2
- library_name: transformers
3
- model_name: unified_final
4
- tags:
5
- - generated_from_trainer
6
- - trl
7
- - grpo
8
- licence: license
9
- ---
10
-
11
- # Model Card for unified_final
12
-
13
- This model is a fine-tuned version of [None](https://huggingface.co/None).
14
- It has been trained using [TRL](https://github.com/huggingface/trl).
15
-
16
- ## Quick start
17
-
18
- ```python
19
- from transformers import pipeline
20
-
21
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
22
- generator = pipeline("text-generation", model="None", device="cuda")
23
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
24
- print(output["generated_text"])
25
- ```
26
-
27
- ## Training procedure
28
-
29
-
30
-
31
-
32
-
33
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
34
-
35
- ### Framework versions
36
-
37
- - TRL: 0.29.0
38
- - Transformers: 5.3.0
39
- - Pytorch: 2.12.0.dev20260307+cu128
40
- - Datasets: 4.6.1
41
- - Tokenizers: 0.22.2
42
-
43
- ## Citations
44
-
45
- Cite GRPO as:
46
-
47
- ```bibtex
48
- @article{shao2024deepseekmath,
49
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
50
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
51
- year = 2024,
52
- eprint = {arXiv:2402.03300},
53
- }
54
-
55
- ```
56
-
57
- Cite TRL as:
58
-
59
- ```bibtex
60
- @software{vonwerra2020trl,
61
- title = {{TRL: Transformers Reinforcement Learning}},
62
- author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
63
- license = {Apache-2.0},
64
- url = {https://github.com/huggingface/trl},
65
- year = {2020}
66
- }
67
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/chat_template.jinja DELETED
@@ -1,15 +0,0 @@
1
- {% for message in messages %}
2
- {% if message['role'] == 'user' %}
3
- {{ '<|user|>
4
- ' + message['content'] + eos_token }}
5
- {% elif message['role'] == 'system' %}
6
- {{ '<|system|>
7
- ' + message['content'] + eos_token }}
8
- {% elif message['role'] == 'assistant' %}
9
- {{ '<|assistant|>
10
- ' + message['content'] + eos_token }}
11
- {% endif %}
12
- {% if loop.last and add_generation_prompt %}
13
- {{ '<|assistant|>' }}
14
- {% endif %}
15
- {% endfor %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-100/chat_template.jinja DELETED
@@ -1,15 +0,0 @@
1
- {% for message in messages %}
2
- {% if message['role'] == 'user' %}
3
- {{ '<|user|>
4
- ' + message['content'] + eos_token }}
5
- {% elif message['role'] == 'system' %}
6
- {{ '<|system|>
7
- ' + message['content'] + eos_token }}
8
- {% elif message['role'] == 'assistant' %}
9
- {{ '<|assistant|>
10
- ' + message['content'] + eos_token }}
11
- {% endif %}
12
- {% if loop.last and add_generation_prompt %}
13
- {{ '<|assistant|>' }}
14
- {% endif %}
15
- {% endfor %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-100/config.json DELETED
@@ -1,32 +0,0 @@
1
- {
2
- "architectures": [
3
- "LlamaForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": 1,
8
- "dtype": "float32",
9
- "eos_token_id": 2,
10
- "head_dim": 64,
11
- "hidden_act": "silu",
12
- "hidden_size": 2048,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 5632,
15
- "max_position_embeddings": 2048,
16
- "mlp_bias": false,
17
- "model_type": "llama",
18
- "num_attention_heads": 32,
19
- "num_hidden_layers": 22,
20
- "num_key_value_heads": 4,
21
- "pad_token_id": 2,
22
- "pretraining_tp": 1,
23
- "rms_norm_eps": 1e-05,
24
- "rope_parameters": {
25
- "rope_theta": 10000.0,
26
- "rope_type": "default"
27
- },
28
- "tie_word_embeddings": false,
29
- "transformers_version": "5.3.0",
30
- "use_cache": false,
31
- "vocab_size": 32000
32
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-100/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 1,
3
- "eos_token_id": [
4
- 2
5
- ],
6
- "max_length": 2048,
7
- "pad_token_id": 2,
8
- "transformers_version": "5.3.0"
9
- }
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-100/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
training/checkpoints/unified_final/checkpoint-100/tokenizer_config.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "add_prefix_space": null,
3
- "backend": "tokenizers",
4
- "bos_token": "<s>",
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "</s>",
7
- "is_local": true,
8
- "max_length": null,
9
- "model_max_length": 2048,
10
- "pad_to_multiple_of": null,
11
- "pad_token": "</s>",
12
- "pad_token_type_id": 0,
13
- "padding_side": "left",
14
- "sp_model_kwargs": {},
15
- "tokenizer_class": "LlamaTokenizer",
16
- "truncation_side": "left",
17
- "unk_token": "<unk>",
18
- "use_default_system_prompt": false
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-100/trainer_state.json DELETED
@@ -1,304 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 0.1,
6
- "eval_steps": 500,
7
- "global_step": 100,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "clip_ratio/high_max": 0.0,
14
- "clip_ratio/high_mean": 0.0,
15
- "clip_ratio/low_mean": 0.0,
16
- "clip_ratio/low_min": 0.0,
17
- "clip_ratio/region_mean": 0.0,
18
- "completions/clipped_ratio": 1.0,
19
- "completions/max_length": 100.0,
20
- "completions/max_terminated_length": 0.0,
21
- "completions/mean_length": 100.0,
22
- "completions/mean_terminated_length": 0.0,
23
- "completions/min_length": 100.0,
24
- "completions/min_terminated_length": 0.0,
25
- "entropy": 1.3566992908716202,
26
- "epoch": 0.01,
27
- "frac_reward_zero_std": 0.0,
28
- "grad_norm": 0.7344621419906616,
29
- "learning_rate": 4.775e-06,
30
- "loss": 3.0994415283203126e-07,
31
- "num_tokens": 35800.0,
32
- "reward": 0.01268580500036478,
33
- "reward_std": 0.02462496655061841,
34
- "rewards/compute_reward/mean": 0.01268580500036478,
35
- "rewards/compute_reward/std": 0.024624967435374855,
36
- "step": 10,
37
- "step_time": 10.7134718033034
38
- },
39
- {
40
- "clip_ratio/high_max": 0.0,
41
- "clip_ratio/high_mean": 0.0,
42
- "clip_ratio/low_mean": 0.0,
43
- "clip_ratio/low_min": 0.0,
44
- "clip_ratio/region_mean": 0.0,
45
- "completions/clipped_ratio": 1.0,
46
- "completions/max_length": 100.0,
47
- "completions/max_terminated_length": 0.0,
48
- "completions/mean_length": 100.0,
49
- "completions/mean_terminated_length": 0.0,
50
- "completions/min_length": 100.0,
51
- "completions/min_terminated_length": 0.0,
52
- "entropy": 1.3169427752494811,
53
- "epoch": 0.02,
54
- "frac_reward_zero_std": 0.0,
55
- "grad_norm": 4.441726207733154,
56
- "learning_rate": 4.525000000000001e-06,
57
- "loss": -4.246830940246582e-07,
58
- "num_tokens": 71748.0,
59
- "reward": -0.04455982223153114,
60
- "reward_std": 0.035665383422747256,
61
- "rewards/compute_reward/mean": -0.04455982223153114,
62
- "rewards/compute_reward/std": 0.03566538490122184,
63
- "step": 20,
64
- "step_time": 10.643421414200565
65
- },
66
- {
67
- "clip_ratio/high_max": 0.0,
68
- "clip_ratio/high_mean": 0.0,
69
- "clip_ratio/low_mean": 0.0,
70
- "clip_ratio/low_min": 0.0,
71
- "clip_ratio/region_mean": 0.0,
72
- "completions/clipped_ratio": 0.9875,
73
- "completions/max_length": 100.0,
74
- "completions/max_terminated_length": 8.9,
75
- "completions/mean_length": 99.8625,
76
- "completions/mean_terminated_length": 8.9,
77
- "completions/min_length": 98.9,
78
- "completions/min_terminated_length": 8.9,
79
- "entropy": 1.0057833462953567,
80
- "epoch": 0.03,
81
- "frac_reward_zero_std": 0.0,
82
- "grad_norm": 3.0170326232910156,
83
- "learning_rate": 4.2750000000000006e-06,
84
- "loss": -0.0018164031207561493,
85
- "num_tokens": 108181.0,
86
- "reward": 0.0374881561845541,
87
- "reward_std": 0.020618790527805686,
88
- "rewards/compute_reward/mean": 0.0374881561845541,
89
- "rewards/compute_reward/std": 0.0206187907140702,
90
- "step": 30,
91
- "step_time": 10.756140169796709
92
- },
93
- {
94
- "clip_ratio/high_max": 0.0,
95
- "clip_ratio/high_mean": 0.0,
96
- "clip_ratio/low_mean": 0.0,
97
- "clip_ratio/low_min": 0.0,
98
- "clip_ratio/region_mean": 0.0,
99
- "completions/clipped_ratio": 0.9875,
100
- "completions/max_length": 100.0,
101
- "completions/max_terminated_length": 6.6,
102
- "completions/mean_length": 99.575,
103
- "completions/mean_terminated_length": 6.6,
104
- "completions/min_length": 96.6,
105
- "completions/min_terminated_length": 6.6,
106
- "entropy": 1.7816664546728134,
107
- "epoch": 0.04,
108
- "frac_reward_zero_std": 0.0,
109
- "grad_norm": 5.86561393737793,
110
- "learning_rate": 4.0250000000000004e-06,
111
- "loss": -0.006361240148544311,
112
- "num_tokens": 143375.0,
113
- "reward": -0.014824284799396991,
114
- "reward_std": 0.06699581742286682,
115
- "rewards/compute_reward/mean": -0.014824284799396991,
116
- "rewards/compute_reward/std": 0.06699582003057003,
117
- "step": 40,
118
- "step_time": 10.785410385398427
119
- },
120
- {
121
- "clip_ratio/high_max": 0.0,
122
- "clip_ratio/high_mean": 0.0,
123
- "clip_ratio/low_mean": 0.0,
124
- "clip_ratio/low_min": 0.0,
125
- "clip_ratio/region_mean": 0.0,
126
- "completions/clipped_ratio": 0.9875,
127
- "completions/max_length": 100.0,
128
- "completions/max_terminated_length": 3.0,
129
- "completions/mean_length": 99.125,
130
- "completions/mean_terminated_length": 3.0,
131
- "completions/min_length": 93.0,
132
- "completions/min_terminated_length": 3.0,
133
- "entropy": 2.1307705104351045,
134
- "epoch": 0.05,
135
- "frac_reward_zero_std": 0.0,
136
- "grad_norm": 6.191352367401123,
137
- "learning_rate": 3.7750000000000003e-06,
138
- "loss": -0.011027154326438905,
139
- "num_tokens": 178941.0,
140
- "reward": -0.016337488451972602,
141
- "reward_std": 0.051818730868399145,
142
- "rewards/compute_reward/mean": -0.016337488451972602,
143
- "rewards/compute_reward/std": 0.05181873142719269,
144
- "step": 50,
145
- "step_time": 10.741381045605522
146
- },
147
- {
148
- "clip_ratio/high_max": 0.0,
149
- "clip_ratio/high_mean": 0.0,
150
- "clip_ratio/low_mean": 0.0,
151
- "clip_ratio/low_min": 0.0,
152
- "clip_ratio/region_mean": 0.0,
153
- "completions/clipped_ratio": 0.9875,
154
- "completions/max_length": 100.0,
155
- "completions/max_terminated_length": 8.8,
156
- "completions/mean_length": 99.85,
157
- "completions/mean_terminated_length": 8.8,
158
- "completions/min_length": 98.8,
159
- "completions/min_terminated_length": 8.8,
160
- "entropy": 2.1041357040405275,
161
- "epoch": 0.06,
162
- "frac_reward_zero_std": 0.0,
163
- "grad_norm": 8.536041259765625,
164
- "learning_rate": 3.525e-06,
165
- "loss": 0.0019509844481945039,
166
- "num_tokens": 216257.0,
167
- "reward": 0.035917540453374384,
168
- "reward_std": 0.04930563308298588,
169
- "rewards/compute_reward/mean": 0.035917540453374384,
170
- "rewards/compute_reward/std": 0.049305635318160054,
171
- "step": 60,
172
- "step_time": 11.27133785020269
173
- },
174
- {
175
- "clip_ratio/high_max": 0.0,
176
- "clip_ratio/high_mean": 0.0,
177
- "clip_ratio/low_mean": 0.0,
178
- "clip_ratio/low_min": 0.0,
179
- "clip_ratio/region_mean": 0.0,
180
- "completions/clipped_ratio": 0.8,
181
- "completions/max_length": 100.0,
182
- "completions/max_terminated_length": 48.2,
183
- "completions/mean_length": 92.9625,
184
- "completions/mean_terminated_length": 38.51333351135254,
185
- "completions/min_length": 70.1,
186
- "completions/min_terminated_length": 30.1,
187
- "entropy": 1.6469052851200103,
188
- "epoch": 0.07,
189
- "frac_reward_zero_std": 0.0,
190
- "grad_norm": 6.919373512268066,
191
- "learning_rate": 3.2750000000000004e-06,
192
- "loss": -0.02075239419937134,
193
- "num_tokens": 251110.0,
194
- "reward": 0.007261525164358318,
195
- "reward_std": 0.0802696269005537,
196
- "rewards/compute_reward/mean": 0.007261525164358318,
197
- "rewards/compute_reward/std": 0.08026962876319885,
198
- "step": 70,
199
- "step_time": 10.774873650902009
200
- },
201
- {
202
- "clip_ratio/high_max": 0.0,
203
- "clip_ratio/high_mean": 0.0,
204
- "clip_ratio/low_mean": 0.0,
205
- "clip_ratio/low_min": 0.0,
206
- "clip_ratio/region_mean": 0.0,
207
- "completions/clipped_ratio": 0.9875,
208
- "completions/max_length": 100.0,
209
- "completions/max_terminated_length": 3.1,
210
- "completions/mean_length": 99.1375,
211
- "completions/mean_terminated_length": 3.1,
212
- "completions/min_length": 93.1,
213
- "completions/min_terminated_length": 3.1,
214
- "entropy": 2.2336367428302766,
215
- "epoch": 0.08,
216
- "frac_reward_zero_std": 0.0,
217
- "grad_norm": 4.918172836303711,
218
- "learning_rate": 3.0250000000000003e-06,
219
- "loss": 0.008250368386507034,
220
- "num_tokens": 285729.0,
221
- "reward": 0.027657157555222512,
222
- "reward_std": 0.04840414375066757,
223
- "rewards/compute_reward/mean": 0.027657157555222512,
224
- "rewards/compute_reward/std": 0.048404145427048205,
225
- "step": 80,
226
- "step_time": 10.43483721170196
227
- },
228
- {
229
- "clip_ratio/high_max": 0.0,
230
- "clip_ratio/high_mean": 0.0,
231
- "clip_ratio/low_mean": 0.0,
232
- "clip_ratio/low_min": 0.0,
233
- "clip_ratio/region_mean": 0.0,
234
- "completions/clipped_ratio": 1.0,
235
- "completions/max_length": 100.0,
236
- "completions/max_terminated_length": 0.0,
237
- "completions/mean_length": 100.0,
238
- "completions/mean_terminated_length": 0.0,
239
- "completions/min_length": 100.0,
240
- "completions/min_terminated_length": 0.0,
241
- "entropy": 1.8057245463132858,
242
- "epoch": 0.09,
243
- "frac_reward_zero_std": 0.0,
244
- "grad_norm": 4.417481422424316,
245
- "learning_rate": 2.7750000000000005e-06,
246
- "loss": 2.216547727584839e-08,
247
- "num_tokens": 320249.0,
248
- "reward": 0.07908838111907243,
249
- "reward_std": 0.07920666746795177,
250
- "rewards/compute_reward/mean": 0.07908838111907243,
251
- "rewards/compute_reward/std": 0.07920666970312595,
252
- "step": 90,
253
- "step_time": 10.337220244196942
254
- },
255
- {
256
- "clip_ratio/high_max": 0.0,
257
- "clip_ratio/high_mean": 0.0,
258
- "clip_ratio/low_mean": 0.0,
259
- "clip_ratio/low_min": 0.0,
260
- "clip_ratio/region_mean": 0.0,
261
- "completions/clipped_ratio": 1.0,
262
- "completions/max_length": 100.0,
263
- "completions/max_terminated_length": 0.0,
264
- "completions/mean_length": 100.0,
265
- "completions/mean_terminated_length": 0.0,
266
- "completions/min_length": 100.0,
267
- "completions/min_terminated_length": 0.0,
268
- "entropy": 1.4064194440841675,
269
- "epoch": 0.1,
270
- "frac_reward_zero_std": 0.0,
271
- "grad_norm": 3.352966785430908,
272
- "learning_rate": 2.5250000000000004e-06,
273
- "loss": 8.493661880493164e-08,
274
- "num_tokens": 355369.0,
275
- "reward": 0.14763977155089378,
276
- "reward_std": 0.07424246501177549,
277
- "rewards/compute_reward/mean": 0.14763977155089378,
278
- "rewards/compute_reward/std": 0.0742424676194787,
279
- "step": 100,
280
- "step_time": 10.74917738300719
281
- }
282
- ],
283
- "logging_steps": 10,
284
- "max_steps": 200,
285
- "num_input_tokens_seen": 355369,
286
- "num_train_epochs": 1,
287
- "save_steps": 100,
288
- "stateful_callbacks": {
289
- "TrainerControl": {
290
- "args": {
291
- "should_epoch_stop": false,
292
- "should_evaluate": false,
293
- "should_log": false,
294
- "should_save": true,
295
- "should_training_stop": false
296
- },
297
- "attributes": {}
298
- }
299
- },
300
- "total_flos": 0.0,
301
- "train_batch_size": 2,
302
- "trial_name": null,
303
- "trial_params": null
304
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-200/chat_template.jinja DELETED
@@ -1,15 +0,0 @@
1
- {% for message in messages %}
2
- {% if message['role'] == 'user' %}
3
- {{ '<|user|>
4
- ' + message['content'] + eos_token }}
5
- {% elif message['role'] == 'system' %}
6
- {{ '<|system|>
7
- ' + message['content'] + eos_token }}
8
- {% elif message['role'] == 'assistant' %}
9
- {{ '<|assistant|>
10
- ' + message['content'] + eos_token }}
11
- {% endif %}
12
- {% if loop.last and add_generation_prompt %}
13
- {{ '<|assistant|>' }}
14
- {% endif %}
15
- {% endfor %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-200/config.json DELETED
@@ -1,32 +0,0 @@
1
- {
2
- "architectures": [
3
- "LlamaForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": 1,
8
- "dtype": "float32",
9
- "eos_token_id": 2,
10
- "head_dim": 64,
11
- "hidden_act": "silu",
12
- "hidden_size": 2048,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 5632,
15
- "max_position_embeddings": 2048,
16
- "mlp_bias": false,
17
- "model_type": "llama",
18
- "num_attention_heads": 32,
19
- "num_hidden_layers": 22,
20
- "num_key_value_heads": 4,
21
- "pad_token_id": 2,
22
- "pretraining_tp": 1,
23
- "rms_norm_eps": 1e-05,
24
- "rope_parameters": {
25
- "rope_theta": 10000.0,
26
- "rope_type": "default"
27
- },
28
- "tie_word_embeddings": false,
29
- "transformers_version": "5.3.0",
30
- "use_cache": false,
31
- "vocab_size": 32000
32
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-200/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 1,
3
- "eos_token_id": [
4
- 2
5
- ],
6
- "max_length": 2048,
7
- "pad_token_id": 2,
8
- "transformers_version": "5.3.0"
9
- }
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-200/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
training/checkpoints/unified_final/checkpoint-200/tokenizer_config.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "add_prefix_space": null,
3
- "backend": "tokenizers",
4
- "bos_token": "<s>",
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "</s>",
7
- "is_local": true,
8
- "max_length": null,
9
- "model_max_length": 2048,
10
- "pad_to_multiple_of": null,
11
- "pad_token": "</s>",
12
- "pad_token_type_id": 0,
13
- "padding_side": "left",
14
- "sp_model_kwargs": {},
15
- "tokenizer_class": "LlamaTokenizer",
16
- "truncation_side": "left",
17
- "unk_token": "<unk>",
18
- "use_default_system_prompt": false
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/checkpoint-200/trainer_state.json DELETED
@@ -1,574 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 0.2,
6
- "eval_steps": 500,
7
- "global_step": 200,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "clip_ratio/high_max": 0.0,
14
- "clip_ratio/high_mean": 0.0,
15
- "clip_ratio/low_mean": 0.0,
16
- "clip_ratio/low_min": 0.0,
17
- "clip_ratio/region_mean": 0.0,
18
- "completions/clipped_ratio": 1.0,
19
- "completions/max_length": 100.0,
20
- "completions/max_terminated_length": 0.0,
21
- "completions/mean_length": 100.0,
22
- "completions/mean_terminated_length": 0.0,
23
- "completions/min_length": 100.0,
24
- "completions/min_terminated_length": 0.0,
25
- "entropy": 1.3566992908716202,
26
- "epoch": 0.01,
27
- "frac_reward_zero_std": 0.0,
28
- "grad_norm": 0.7344621419906616,
29
- "learning_rate": 4.775e-06,
30
- "loss": 3.0994415283203126e-07,
31
- "num_tokens": 35800.0,
32
- "reward": 0.01268580500036478,
33
- "reward_std": 0.02462496655061841,
34
- "rewards/compute_reward/mean": 0.01268580500036478,
35
- "rewards/compute_reward/std": 0.024624967435374855,
36
- "step": 10,
37
- "step_time": 10.7134718033034
38
- },
39
- {
40
- "clip_ratio/high_max": 0.0,
41
- "clip_ratio/high_mean": 0.0,
42
- "clip_ratio/low_mean": 0.0,
43
- "clip_ratio/low_min": 0.0,
44
- "clip_ratio/region_mean": 0.0,
45
- "completions/clipped_ratio": 1.0,
46
- "completions/max_length": 100.0,
47
- "completions/max_terminated_length": 0.0,
48
- "completions/mean_length": 100.0,
49
- "completions/mean_terminated_length": 0.0,
50
- "completions/min_length": 100.0,
51
- "completions/min_terminated_length": 0.0,
52
- "entropy": 1.3169427752494811,
53
- "epoch": 0.02,
54
- "frac_reward_zero_std": 0.0,
55
- "grad_norm": 4.441726207733154,
56
- "learning_rate": 4.525000000000001e-06,
57
- "loss": -4.246830940246582e-07,
58
- "num_tokens": 71748.0,
59
- "reward": -0.04455982223153114,
60
- "reward_std": 0.035665383422747256,
61
- "rewards/compute_reward/mean": -0.04455982223153114,
62
- "rewards/compute_reward/std": 0.03566538490122184,
63
- "step": 20,
64
- "step_time": 10.643421414200565
65
- },
66
- {
67
- "clip_ratio/high_max": 0.0,
68
- "clip_ratio/high_mean": 0.0,
69
- "clip_ratio/low_mean": 0.0,
70
- "clip_ratio/low_min": 0.0,
71
- "clip_ratio/region_mean": 0.0,
72
- "completions/clipped_ratio": 0.9875,
73
- "completions/max_length": 100.0,
74
- "completions/max_terminated_length": 8.9,
75
- "completions/mean_length": 99.8625,
76
- "completions/mean_terminated_length": 8.9,
77
- "completions/min_length": 98.9,
78
- "completions/min_terminated_length": 8.9,
79
- "entropy": 1.0057833462953567,
80
- "epoch": 0.03,
81
- "frac_reward_zero_std": 0.0,
82
- "grad_norm": 3.0170326232910156,
83
- "learning_rate": 4.2750000000000006e-06,
84
- "loss": -0.0018164031207561493,
85
- "num_tokens": 108181.0,
86
- "reward": 0.0374881561845541,
87
- "reward_std": 0.020618790527805686,
88
- "rewards/compute_reward/mean": 0.0374881561845541,
89
- "rewards/compute_reward/std": 0.0206187907140702,
90
- "step": 30,
91
- "step_time": 10.756140169796709
92
- },
93
- {
94
- "clip_ratio/high_max": 0.0,
95
- "clip_ratio/high_mean": 0.0,
96
- "clip_ratio/low_mean": 0.0,
97
- "clip_ratio/low_min": 0.0,
98
- "clip_ratio/region_mean": 0.0,
99
- "completions/clipped_ratio": 0.9875,
100
- "completions/max_length": 100.0,
101
- "completions/max_terminated_length": 6.6,
102
- "completions/mean_length": 99.575,
103
- "completions/mean_terminated_length": 6.6,
104
- "completions/min_length": 96.6,
105
- "completions/min_terminated_length": 6.6,
106
- "entropy": 1.7816664546728134,
107
- "epoch": 0.04,
108
- "frac_reward_zero_std": 0.0,
109
- "grad_norm": 5.86561393737793,
110
- "learning_rate": 4.0250000000000004e-06,
111
- "loss": -0.006361240148544311,
112
- "num_tokens": 143375.0,
113
- "reward": -0.014824284799396991,
114
- "reward_std": 0.06699581742286682,
115
- "rewards/compute_reward/mean": -0.014824284799396991,
116
- "rewards/compute_reward/std": 0.06699582003057003,
117
- "step": 40,
118
- "step_time": 10.785410385398427
119
- },
120
- {
121
- "clip_ratio/high_max": 0.0,
122
- "clip_ratio/high_mean": 0.0,
123
- "clip_ratio/low_mean": 0.0,
124
- "clip_ratio/low_min": 0.0,
125
- "clip_ratio/region_mean": 0.0,
126
- "completions/clipped_ratio": 0.9875,
127
- "completions/max_length": 100.0,
128
- "completions/max_terminated_length": 3.0,
129
- "completions/mean_length": 99.125,
130
- "completions/mean_terminated_length": 3.0,
131
- "completions/min_length": 93.0,
132
- "completions/min_terminated_length": 3.0,
133
- "entropy": 2.1307705104351045,
134
- "epoch": 0.05,
135
- "frac_reward_zero_std": 0.0,
136
- "grad_norm": 6.191352367401123,
137
- "learning_rate": 3.7750000000000003e-06,
138
- "loss": -0.011027154326438905,
139
- "num_tokens": 178941.0,
140
- "reward": -0.016337488451972602,
141
- "reward_std": 0.051818730868399145,
142
- "rewards/compute_reward/mean": -0.016337488451972602,
143
- "rewards/compute_reward/std": 0.05181873142719269,
144
- "step": 50,
145
- "step_time": 10.741381045605522
146
- },
147
- {
148
- "clip_ratio/high_max": 0.0,
149
- "clip_ratio/high_mean": 0.0,
150
- "clip_ratio/low_mean": 0.0,
151
- "clip_ratio/low_min": 0.0,
152
- "clip_ratio/region_mean": 0.0,
153
- "completions/clipped_ratio": 0.9875,
154
- "completions/max_length": 100.0,
155
- "completions/max_terminated_length": 8.8,
156
- "completions/mean_length": 99.85,
157
- "completions/mean_terminated_length": 8.8,
158
- "completions/min_length": 98.8,
159
- "completions/min_terminated_length": 8.8,
160
- "entropy": 2.1041357040405275,
161
- "epoch": 0.06,
162
- "frac_reward_zero_std": 0.0,
163
- "grad_norm": 8.536041259765625,
164
- "learning_rate": 3.525e-06,
165
- "loss": 0.0019509844481945039,
166
- "num_tokens": 216257.0,
167
- "reward": 0.035917540453374384,
168
- "reward_std": 0.04930563308298588,
169
- "rewards/compute_reward/mean": 0.035917540453374384,
170
- "rewards/compute_reward/std": 0.049305635318160054,
171
- "step": 60,
172
- "step_time": 11.27133785020269
173
- },
174
- {
175
- "clip_ratio/high_max": 0.0,
176
- "clip_ratio/high_mean": 0.0,
177
- "clip_ratio/low_mean": 0.0,
178
- "clip_ratio/low_min": 0.0,
179
- "clip_ratio/region_mean": 0.0,
180
- "completions/clipped_ratio": 0.8,
181
- "completions/max_length": 100.0,
182
- "completions/max_terminated_length": 48.2,
183
- "completions/mean_length": 92.9625,
184
- "completions/mean_terminated_length": 38.51333351135254,
185
- "completions/min_length": 70.1,
186
- "completions/min_terminated_length": 30.1,
187
- "entropy": 1.6469052851200103,
188
- "epoch": 0.07,
189
- "frac_reward_zero_std": 0.0,
190
- "grad_norm": 6.919373512268066,
191
- "learning_rate": 3.2750000000000004e-06,
192
- "loss": -0.02075239419937134,
193
- "num_tokens": 251110.0,
194
- "reward": 0.007261525164358318,
195
- "reward_std": 0.0802696269005537,
196
- "rewards/compute_reward/mean": 0.007261525164358318,
197
- "rewards/compute_reward/std": 0.08026962876319885,
198
- "step": 70,
199
- "step_time": 10.774873650902009
200
- },
201
- {
202
- "clip_ratio/high_max": 0.0,
203
- "clip_ratio/high_mean": 0.0,
204
- "clip_ratio/low_mean": 0.0,
205
- "clip_ratio/low_min": 0.0,
206
- "clip_ratio/region_mean": 0.0,
207
- "completions/clipped_ratio": 0.9875,
208
- "completions/max_length": 100.0,
209
- "completions/max_terminated_length": 3.1,
210
- "completions/mean_length": 99.1375,
211
- "completions/mean_terminated_length": 3.1,
212
- "completions/min_length": 93.1,
213
- "completions/min_terminated_length": 3.1,
214
- "entropy": 2.2336367428302766,
215
- "epoch": 0.08,
216
- "frac_reward_zero_std": 0.0,
217
- "grad_norm": 4.918172836303711,
218
- "learning_rate": 3.0250000000000003e-06,
219
- "loss": 0.008250368386507034,
220
- "num_tokens": 285729.0,
221
- "reward": 0.027657157555222512,
222
- "reward_std": 0.04840414375066757,
223
- "rewards/compute_reward/mean": 0.027657157555222512,
224
- "rewards/compute_reward/std": 0.048404145427048205,
225
- "step": 80,
226
- "step_time": 10.43483721170196
227
- },
228
- {
229
- "clip_ratio/high_max": 0.0,
230
- "clip_ratio/high_mean": 0.0,
231
- "clip_ratio/low_mean": 0.0,
232
- "clip_ratio/low_min": 0.0,
233
- "clip_ratio/region_mean": 0.0,
234
- "completions/clipped_ratio": 1.0,
235
- "completions/max_length": 100.0,
236
- "completions/max_terminated_length": 0.0,
237
- "completions/mean_length": 100.0,
238
- "completions/mean_terminated_length": 0.0,
239
- "completions/min_length": 100.0,
240
- "completions/min_terminated_length": 0.0,
241
- "entropy": 1.8057245463132858,
242
- "epoch": 0.09,
243
- "frac_reward_zero_std": 0.0,
244
- "grad_norm": 4.417481422424316,
245
- "learning_rate": 2.7750000000000005e-06,
246
- "loss": 2.216547727584839e-08,
247
- "num_tokens": 320249.0,
248
- "reward": 0.07908838111907243,
249
- "reward_std": 0.07920666746795177,
250
- "rewards/compute_reward/mean": 0.07908838111907243,
251
- "rewards/compute_reward/std": 0.07920666970312595,
252
- "step": 90,
253
- "step_time": 10.337220244196942
254
- },
255
- {
256
- "clip_ratio/high_max": 0.0,
257
- "clip_ratio/high_mean": 0.0,
258
- "clip_ratio/low_mean": 0.0,
259
- "clip_ratio/low_min": 0.0,
260
- "clip_ratio/region_mean": 0.0,
261
- "completions/clipped_ratio": 1.0,
262
- "completions/max_length": 100.0,
263
- "completions/max_terminated_length": 0.0,
264
- "completions/mean_length": 100.0,
265
- "completions/mean_terminated_length": 0.0,
266
- "completions/min_length": 100.0,
267
- "completions/min_terminated_length": 0.0,
268
- "entropy": 1.4064194440841675,
269
- "epoch": 0.1,
270
- "frac_reward_zero_std": 0.0,
271
- "grad_norm": 3.352966785430908,
272
- "learning_rate": 2.5250000000000004e-06,
273
- "loss": 8.493661880493164e-08,
274
- "num_tokens": 355369.0,
275
- "reward": 0.14763977155089378,
276
- "reward_std": 0.07424246501177549,
277
- "rewards/compute_reward/mean": 0.14763977155089378,
278
- "rewards/compute_reward/std": 0.0742424676194787,
279
- "step": 100,
280
- "step_time": 10.74917738300719
281
- },
282
- {
283
- "clip_ratio/high_max": 0.0,
284
- "clip_ratio/high_mean": 0.0,
285
- "clip_ratio/low_mean": 0.0,
286
- "clip_ratio/low_min": 0.0,
287
- "clip_ratio/region_mean": 0.0,
288
- "completions/clipped_ratio": 1.0,
289
- "completions/max_length": 100.0,
290
- "completions/max_terminated_length": 0.0,
291
- "completions/mean_length": 100.0,
292
- "completions/mean_terminated_length": 0.0,
293
- "completions/min_length": 100.0,
294
- "completions/min_terminated_length": 0.0,
295
- "entropy": 1.2582464694976807,
296
- "epoch": 0.11,
297
- "frac_reward_zero_std": 0.0,
298
- "grad_norm": 3.9595463275909424,
299
- "learning_rate": 2.2750000000000002e-06,
300
- "loss": -3.874301910400391e-08,
301
- "num_tokens": 392289.0,
302
- "reward": 0.18278183937072753,
303
- "reward_std": 0.052620683796703815,
304
- "rewards/compute_reward/mean": 0.18278183937072753,
305
- "rewards/compute_reward/std": 0.05262068491429091,
306
- "step": 110,
307
- "step_time": 11.17140419179923
308
- },
309
- {
310
- "clip_ratio/high_max": 0.0,
311
- "clip_ratio/high_mean": 0.0,
312
- "clip_ratio/low_mean": 0.0,
313
- "clip_ratio/low_min": 0.0,
314
- "clip_ratio/region_mean": 0.0,
315
- "completions/clipped_ratio": 1.0,
316
- "completions/max_length": 100.0,
317
- "completions/max_terminated_length": 0.0,
318
- "completions/mean_length": 100.0,
319
- "completions/mean_terminated_length": 0.0,
320
- "completions/min_length": 100.0,
321
- "completions/min_terminated_length": 0.0,
322
- "entropy": 0.8805452413856983,
323
- "epoch": 0.12,
324
- "frac_reward_zero_std": 0.0,
325
- "grad_norm": 2.707214593887329,
326
- "learning_rate": 2.025e-06,
327
- "loss": 1.5050172805786132e-07,
328
- "num_tokens": 430501.0,
329
- "reward": 0.22903144657611846,
330
- "reward_std": 0.04029850559309125,
331
- "rewards/compute_reward/mean": 0.22903144657611846,
332
- "rewards/compute_reward/std": 0.04029850568622351,
333
- "step": 120,
334
- "step_time": 11.244449263699062
335
- },
336
- {
337
- "clip_ratio/high_max": 0.0,
338
- "clip_ratio/high_mean": 0.0,
339
- "clip_ratio/low_mean": 0.0,
340
- "clip_ratio/low_min": 0.0,
341
- "clip_ratio/region_mean": 0.0,
342
- "completions/clipped_ratio": 1.0,
343
- "completions/max_length": 100.0,
344
- "completions/max_terminated_length": 0.0,
345
- "completions/mean_length": 100.0,
346
- "completions/mean_terminated_length": 0.0,
347
- "completions/min_length": 100.0,
348
- "completions/min_terminated_length": 0.0,
349
- "entropy": 0.8755271568894386,
350
- "epoch": 0.13,
351
- "frac_reward_zero_std": 0.0,
352
- "grad_norm": 3.942605495452881,
353
- "learning_rate": 1.7750000000000002e-06,
354
- "loss": 1.2218952178955077e-07,
355
- "num_tokens": 467245.0,
356
- "reward": 0.18334048390388488,
357
- "reward_std": 0.07254596166312695,
358
- "rewards/compute_reward/mean": 0.18334048390388488,
359
- "rewards/compute_reward/std": 0.072545962408185,
360
- "step": 130,
361
- "step_time": 11.071729802998016
362
- },
363
- {
364
- "clip_ratio/high_max": 0.0,
365
- "clip_ratio/high_mean": 0.0,
366
- "clip_ratio/low_mean": 0.0,
367
- "clip_ratio/low_min": 0.0,
368
- "clip_ratio/region_mean": 0.0,
369
- "completions/clipped_ratio": 1.0,
370
- "completions/max_length": 100.0,
371
- "completions/max_terminated_length": 0.0,
372
- "completions/mean_length": 100.0,
373
- "completions/mean_terminated_length": 0.0,
374
- "completions/min_length": 100.0,
375
- "completions/min_terminated_length": 0.0,
376
- "entropy": 0.9737002968788147,
377
- "epoch": 0.14,
378
- "frac_reward_zero_std": 0.0,
379
- "grad_norm": 4.040837287902832,
380
- "learning_rate": 1.525e-06,
381
- "loss": -1.4007091522216797e-07,
382
- "num_tokens": 503017.0,
383
- "reward": 0.20783505886793135,
384
- "reward_std": 0.06580547224730253,
385
- "rewards/compute_reward/mean": 0.20783505886793135,
386
- "rewards/compute_reward/std": 0.06580547466874123,
387
- "step": 140,
388
- "step_time": 10.841636341501726
389
- },
390
- {
391
- "clip_ratio/high_max": 0.0,
392
- "clip_ratio/high_mean": 0.0,
393
- "clip_ratio/low_mean": 0.0,
394
- "clip_ratio/low_min": 0.0,
395
- "clip_ratio/region_mean": 0.0,
396
- "completions/clipped_ratio": 1.0,
397
- "completions/max_length": 100.0,
398
- "completions/max_terminated_length": 0.0,
399
- "completions/mean_length": 100.0,
400
- "completions/mean_terminated_length": 0.0,
401
- "completions/min_length": 100.0,
402
- "completions/min_terminated_length": 0.0,
403
- "entropy": 0.9901166066527367,
404
- "epoch": 0.15,
405
- "frac_reward_zero_std": 0.0,
406
- "grad_norm": 3.720881462097168,
407
- "learning_rate": 1.275e-06,
408
- "loss": 2.0861625671386717e-08,
409
- "num_tokens": 539801.0,
410
- "reward": 0.2224348157644272,
411
- "reward_std": 0.05879365894943476,
412
- "rewards/compute_reward/mean": 0.2224348157644272,
413
- "rewards/compute_reward/std": 0.05879366043955088,
414
- "step": 150,
415
- "step_time": 10.85469058619783
416
- },
417
- {
418
- "clip_ratio/high_max": 0.0,
419
- "clip_ratio/high_mean": 0.0,
420
- "clip_ratio/low_mean": 0.0,
421
- "clip_ratio/low_min": 0.0,
422
- "clip_ratio/region_mean": 0.0,
423
- "completions/clipped_ratio": 1.0,
424
- "completions/max_length": 100.0,
425
- "completions/max_terminated_length": 0.0,
426
- "completions/mean_length": 100.0,
427
- "completions/mean_terminated_length": 0.0,
428
- "completions/min_length": 100.0,
429
- "completions/min_terminated_length": 0.0,
430
- "entropy": 1.1208710052073,
431
- "epoch": 0.16,
432
- "frac_reward_zero_std": 0.0,
433
- "grad_norm": 3.452557325363159,
434
- "learning_rate": 1.025e-06,
435
- "loss": 1.4603137969970704e-07,
436
- "num_tokens": 575385.0,
437
- "reward": 0.1992661789059639,
438
- "reward_std": 0.06030977526679635,
439
- "rewards/compute_reward/mean": 0.1992661789059639,
440
- "rewards/compute_reward/std": 0.060309774987399575,
441
- "step": 160,
442
- "step_time": 10.620040459206212
443
- },
444
- {
445
- "clip_ratio/high_max": 0.0,
446
- "clip_ratio/high_mean": 0.0,
447
- "clip_ratio/low_mean": 0.0,
448
- "clip_ratio/low_min": 0.0,
449
- "clip_ratio/region_mean": 0.0,
450
- "completions/clipped_ratio": 0.9875,
451
- "completions/max_length": 100.0,
452
- "completions/max_terminated_length": 8.5,
453
- "completions/mean_length": 99.8125,
454
- "completions/mean_terminated_length": 8.5,
455
- "completions/min_length": 98.5,
456
- "completions/min_terminated_length": 8.5,
457
- "entropy": 0.943237779289484,
458
- "epoch": 0.17,
459
- "frac_reward_zero_std": 0.0,
460
- "grad_norm": 3.998199701309204,
461
- "learning_rate": 7.750000000000001e-07,
462
- "loss": 0.0005225777626037597,
463
- "num_tokens": 611998.0,
464
- "reward": 0.21552147567272187,
465
- "reward_std": 0.032230423856526615,
466
- "rewards/compute_reward/mean": 0.21552147567272187,
467
- "rewards/compute_reward/std": 0.0322304243221879,
468
- "step": 170,
469
- "step_time": 10.901679297701047
470
- },
471
- {
472
- "clip_ratio/high_max": 0.0,
473
- "clip_ratio/high_mean": 0.0,
474
- "clip_ratio/low_mean": 0.0,
475
- "clip_ratio/low_min": 0.0,
476
- "clip_ratio/region_mean": 0.0,
477
- "completions/clipped_ratio": 1.0,
478
- "completions/max_length": 100.0,
479
- "completions/max_terminated_length": 0.0,
480
- "completions/mean_length": 100.0,
481
- "completions/mean_terminated_length": 0.0,
482
- "completions/min_length": 100.0,
483
- "completions/min_terminated_length": 0.0,
484
- "entropy": 0.9798725090920926,
485
- "epoch": 0.18,
486
- "frac_reward_zero_std": 0.0,
487
- "grad_norm": 3.732668161392212,
488
- "learning_rate": 5.250000000000001e-07,
489
- "loss": -8.270144462585449e-08,
490
- "num_tokens": 647338.0,
491
- "reward": 0.21226384192705156,
492
- "reward_std": 0.06548679377883673,
493
- "rewards/compute_reward/mean": 0.21226384192705156,
494
- "rewards/compute_reward/std": 0.0654867960140109,
495
- "step": 180,
496
- "step_time": 10.853807216498534
497
- },
498
- {
499
- "clip_ratio/high_max": 0.0,
500
- "clip_ratio/high_mean": 0.0,
501
- "clip_ratio/low_mean": 0.0,
502
- "clip_ratio/low_min": 0.0,
503
- "clip_ratio/region_mean": 0.0,
504
- "completions/clipped_ratio": 1.0,
505
- "completions/max_length": 100.0,
506
- "completions/max_terminated_length": 0.0,
507
- "completions/mean_length": 100.0,
508
- "completions/mean_terminated_length": 0.0,
509
- "completions/min_length": 100.0,
510
- "completions/min_terminated_length": 0.0,
511
- "entropy": 0.9461549550294877,
512
- "epoch": 0.19,
513
- "frac_reward_zero_std": 0.0,
514
- "grad_norm": 3.7145590782165527,
515
- "learning_rate": 2.75e-07,
516
- "loss": -2.1532177925109862e-07,
517
- "num_tokens": 682026.0,
518
- "reward": 0.21948475018143654,
519
- "reward_std": 0.05461370516568422,
520
- "rewards/compute_reward/mean": 0.21948475018143654,
521
- "rewards/compute_reward/std": 0.05461370553821325,
522
- "step": 190,
523
- "step_time": 10.456350517399551
524
- },
525
- {
526
- "clip_ratio/high_max": 0.0,
527
- "clip_ratio/high_mean": 0.0,
528
- "clip_ratio/low_mean": 0.0,
529
- "clip_ratio/low_min": 0.0,
530
- "clip_ratio/region_mean": 0.0,
531
- "completions/clipped_ratio": 1.0,
532
- "completions/max_length": 100.0,
533
- "completions/max_terminated_length": 0.0,
534
- "completions/mean_length": 100.0,
535
- "completions/mean_terminated_length": 0.0,
536
- "completions/min_length": 100.0,
537
- "completions/min_terminated_length": 0.0,
538
- "entropy": 0.8442220821976661,
539
- "epoch": 0.2,
540
- "frac_reward_zero_std": 0.0,
541
- "grad_norm": 3.7965171337127686,
542
- "learning_rate": 2.5000000000000002e-08,
543
- "loss": 1.0430812835693359e-08,
544
- "num_tokens": 716746.0,
545
- "reward": 0.2305009976029396,
546
- "reward_std": 0.03879760131239891,
547
- "rewards/compute_reward/mean": 0.2305009976029396,
548
- "rewards/compute_reward/std": 0.03879760047420859,
549
- "step": 200,
550
- "step_time": 10.340635509999993
551
- }
552
- ],
553
- "logging_steps": 10,
554
- "max_steps": 200,
555
- "num_input_tokens_seen": 716746,
556
- "num_train_epochs": 1,
557
- "save_steps": 100,
558
- "stateful_callbacks": {
559
- "TrainerControl": {
560
- "args": {
561
- "should_epoch_stop": false,
562
- "should_evaluate": false,
563
- "should_log": false,
564
- "should_save": true,
565
- "should_training_stop": true
566
- },
567
- "attributes": {}
568
- }
569
- },
570
- "total_flos": 0.0,
571
- "train_batch_size": 2,
572
- "trial_name": null,
573
- "trial_params": null
574
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/config.json DELETED
@@ -1,32 +0,0 @@
1
- {
2
- "architectures": [
3
- "LlamaForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": 1,
8
- "dtype": "float32",
9
- "eos_token_id": 2,
10
- "head_dim": 64,
11
- "hidden_act": "silu",
12
- "hidden_size": 2048,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 5632,
15
- "max_position_embeddings": 2048,
16
- "mlp_bias": false,
17
- "model_type": "llama",
18
- "num_attention_heads": 32,
19
- "num_hidden_layers": 22,
20
- "num_key_value_heads": 4,
21
- "pad_token_id": 2,
22
- "pretraining_tp": 1,
23
- "rms_norm_eps": 1e-05,
24
- "rope_parameters": {
25
- "rope_theta": 10000.0,
26
- "rope_type": "default"
27
- },
28
- "tie_word_embeddings": false,
29
- "transformers_version": "5.3.0",
30
- "use_cache": false,
31
- "vocab_size": 32000
32
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/generation_config.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token_id": 1,
3
- "eos_token_id": [
4
- 2
5
- ],
6
- "max_length": 2048,
7
- "pad_token_id": 2,
8
- "transformers_version": "5.3.0"
9
- }
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/tokenizer.json DELETED
The diff for this file is too large to render. See raw diff
 
training/checkpoints/unified_final/tokenizer_config.json DELETED
@@ -1,19 +0,0 @@
1
- {
2
- "add_prefix_space": null,
3
- "backend": "tokenizers",
4
- "bos_token": "<s>",
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "</s>",
7
- "is_local": true,
8
- "max_length": null,
9
- "model_max_length": 2048,
10
- "pad_to_multiple_of": null,
11
- "pad_token": "</s>",
12
- "pad_token_type_id": 0,
13
- "padding_side": "left",
14
- "sp_model_kwargs": {},
15
- "tokenizer_class": "LlamaTokenizer",
16
- "truncation_side": "left",
17
- "unk_token": "<unk>",
18
- "use_default_system_prompt": false
19
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/checkpoints/unified_final/unified_reward_log.json DELETED
@@ -1,810 +0,0 @@
1
- {
2
- "accuracy": [
3
- 0.012478123821101302,
4
- 0.013689774048328765,
5
- 0.12357050236883002,
6
- 0.043150096433237195,
7
- 0.11808098944816375,
8
- 0.14478551750907398,
9
- 0.21936089415676943,
10
- 0.14560732765872023,
11
- 0.12766012796254073,
12
- 0.16228250732999258,
13
- 0.19256023689530533,
14
- 0.153446869824083,
15
- 0.08735395734236795,
16
- 0.25620539761275585,
17
- 0.2796424323605421,
18
- 0.4050695781981913,
19
- 0.34320680785281277,
20
- 0.39042326634482405,
21
- 0.24141882976569753,
22
- 0.2882491476114424,
23
- 0.2805112680700598,
24
- 0.1299182187184869,
25
- 0.18283964773559502,
26
- 0.08174918994377885,
27
- 0.1305077084983307,
28
- 0.15188368799701088,
29
- 0.10731278214010087,
30
- 0.10817607256366782,
31
- 0.1742403849902705,
32
- 0.15966549523684162,
33
- 0.21224383614993403,
34
- 0.30634267989144903,
35
- 0.2563189622014761,
36
- 0.13088561721084532,
37
- 0.23896305011421776,
38
- 0.36338720554077614,
39
- 0.2743395734578371,
40
- 0.2785670698390685,
41
- 0.26690704237418583,
42
- 0.23420825800444123,
43
- 0.4486492634482796,
44
- 0.3085314377908274,
45
- 0.27236165767163295,
46
- 0.351135627192783,
47
- 0.37157259147763155,
48
- 0.4091061054548437,
49
- 0.3321387716436809,
50
- 0.25690332708634805,
51
- 0.4042620632377111,
52
- 0.21426805183517378,
53
- 0.46486986328175767,
54
- 0.5354255396266014,
55
- 0.5316739152617584,
56
- 0.3626249278251227,
57
- 0.5560084815324287,
58
- 0.47374602488847506,
59
- 0.5622030981309204,
60
- 0.6260334739834723,
61
- 0.5388746766273916,
62
- 0.43546972183358157,
63
- 0.4384314355118149,
64
- 0.43255371653260083,
65
- 0.382003842773009,
66
- 0.33916141995282467,
67
- 0.4102824234143368,
68
- 0.4002692943218704,
69
- 0.4433627484561765,
70
- 0.5707634448719365,
71
- 0.3326736211199734,
72
- 0.41868448313128437,
73
- 0.4830820909726724,
74
- 0.5073173724203757,
75
- 0.6011403764343056,
76
- 0.2652010267221505,
77
- 0.5708498617899997,
78
- 0.5372080254474398,
79
- 0.34268688791221447,
80
- 0.36077516272765764,
81
- 0.6577040443039563,
82
- 0.5249539674929385,
83
- 0.3393068936409599,
84
- 0.3981918416905377,
85
- 0.5998766558760262,
86
- 0.3886278953534839,
87
- 0.47030574201103836,
88
- 0.5933578772929455,
89
- 0.629797753552287,
90
- 0.6829957361516797,
91
- 0.5975855789903534,
92
- 0.37033629002672747,
93
- 0.40129960235208273,
94
- 0.44104763492941856,
95
- 0.5250475457257945,
96
- 0.5792574424612014,
97
- 0.25491493314992414,
98
- 0.4456432306425367,
99
- 0.3674802188566988,
100
- 0.5168529125349757,
101
- 0.7135775878197881,
102
- 0.408872426591652,
103
- 0.29645813006976085,
104
- 0.5807047440217663,
105
- 0.3951396545427582,
106
- 0.5820897600332913,
107
- 0.5751887943251881,
108
- 0.6462836385320105,
109
- 0.452535930180199,
110
- 0.6309295986678539,
111
- 0.521345004487674,
112
- 0.7523772581521466,
113
- 0.3868275580258203,
114
- 0.6621844534173644,
115
- 0.757102247782526,
116
- 0.7496667811480936,
117
- 0.765902349873787,
118
- 0.7620735178706088,
119
- 0.8005386810387373,
120
- 0.7600417191929723,
121
- 0.7790964529097753,
122
- 0.8060362095807505,
123
- 0.6639245812548539,
124
- 0.49642928937921477,
125
- 0.4622820479255877,
126
- 0.5039745619269863,
127
- 0.5521504355740943,
128
- 0.763103948879152,
129
- 0.3649169562800698,
130
- 0.8642640291197355,
131
- 0.7673212948914258,
132
- 0.6856467187291327,
133
- 0.6203947744628628,
134
- 0.635864180446877,
135
- 0.7076110516058842,
136
- 0.45257112707172986,
137
- 0.4927382976084982,
138
- 0.735338338570779,
139
- 0.7325108773598185,
140
- 0.5286115260781837,
141
- 0.6873601944038981,
142
- 0.7558585478414992,
143
- 0.8025525164825894,
144
- 0.5403924472630024,
145
- 0.8109585656614495,
146
- 0.45960476465808653,
147
- 0.7726514123926349,
148
- 0.78036072270019,
149
- 0.5612159043391909,
150
- 0.668619691132455,
151
- 0.7187997825397312,
152
- 0.6008389099901545,
153
- 0.5160061409523324,
154
- 0.6712722339255528,
155
- 0.25213094055121654,
156
- 0.7931299787283417,
157
- 0.5770709363152806,
158
- 0.3674653100689218,
159
- 0.7533031922202384,
160
- 0.5477579357220128,
161
- 0.9013020257140825,
162
- 0.774595058715597,
163
- 0.5444791193214735,
164
- 0.28536322558907645,
165
- 0.8018009673613502,
166
- 0.7534115956222964,
167
- 0.8178817865612724,
168
- 0.7691389758719754,
169
- 0.746364161759599,
170
- 0.7686015134039534,
171
- 0.734219302571865,
172
- 0.32221002464589255,
173
- 0.47941368112339633,
174
- 0.7168057798061833,
175
- 0.772261652825011,
176
- 0.5291935548529084,
177
- 0.7485607594114032,
178
- 0.5932522241567504,
179
- 0.5648661194163807,
180
- 0.5709367030781823,
181
- 0.7752278802176389,
182
- 0.6248770881515031,
183
- 0.5446761697530746,
184
- 0.8044651419608864,
185
- 0.855248827897706,
186
- 0.5436122580157401,
187
- 0.9085174062877894,
188
- 0.31500336882736524,
189
- 0.6913784691774245,
190
- 0.5400797382818436,
191
- 0.6050753133365693,
192
- 0.7986505120673587,
193
- 0.8202528873914283,
194
- 0.6996518377501237,
195
- 0.8313200483947909,
196
- 0.4808844911385792,
197
- 0.7306097140061414,
198
- 0.5058602896511918,
199
- 0.6438089653119033,
200
- 0.7879260241436392,
201
- 0.8337068369817564,
202
- 0.537435884385747
203
- ],
204
- "outcome": [
205
- 0.4,
206
- 0.42500000000000004,
207
- 0.4375,
208
- 0.42500000000000004,
209
- 0.4,
210
- 0.4,
211
- 0.4,
212
- 0.25,
213
- 0.4,
214
- 0.0,
215
- 0.0,
216
- 0.0,
217
- 0.0,
218
- 0.07500000000000001,
219
- 0.025,
220
- 0.07500000000000001,
221
- 0.0,
222
- 0.07500000000000001,
223
- 0.05,
224
- 0.07500000000000001,
225
- 0.225,
226
- 0.4,
227
- 0.4,
228
- 0.4,
229
- 0.42500000000000004,
230
- 0.4,
231
- 0.4,
232
- 0.4,
233
- 0.4,
234
- 0.4,
235
- 0.35000000000000003,
236
- 0.175,
237
- 0.15,
238
- 0.15000000000000002,
239
- 0.07500000000000001,
240
- 0.17500000000000002,
241
- 0.1,
242
- 0.0,
243
- 0.05,
244
- 0.07500000000000001,
245
- 0.07500000000000001,
246
- 0.07500000000000001,
247
- 0.025,
248
- 0.0,
249
- 0.0,
250
- 0.0,
251
- 0.07500000000000001,
252
- 0.15000000000000002,
253
- 0.0,
254
- 0.05,
255
- 0.0,
256
- 0.025,
257
- 0.0,
258
- 0.0,
259
- 0.0,
260
- 0.05,
261
- 0.0,
262
- 0.05,
263
- 0.025,
264
- 0.07500000000000001,
265
- 0.0,
266
- 0.05,
267
- 0.025,
268
- 0.1,
269
- 0.025,
270
- 0.025,
271
- 0.025,
272
- 0.025,
273
- 0.0,
274
- 0.05,
275
- 0.05,
276
- 0.0,
277
- 0.05,
278
- 0.0,
279
- 0.0,
280
- 0.025,
281
- 0.05,
282
- 0.025,
283
- 0.0,
284
- 0.025,
285
- 0.05,
286
- 0.07500000000000001,
287
- 0.125,
288
- 0.25,
289
- 0.125,
290
- 0.2,
291
- 0.05,
292
- 0.17500000000000002,
293
- 0.225,
294
- 0.2,
295
- 0.30000000000000004,
296
- 0.375,
297
- 0.35,
298
- 0.42500000000000004,
299
- 0.35000000000000003,
300
- 0.42500000000000004,
301
- 0.4,
302
- 0.4,
303
- 0.4,
304
- 0.42500000000000004,
305
- 0.42500000000000004,
306
- 0.45,
307
- 0.4,
308
- 0.4,
309
- 0.4,
310
- 0.4,
311
- 0.4,
312
- 0.4,
313
- 0.45,
314
- 0.35000000000000003,
315
- 0.4,
316
- 0.4,
317
- 0.4,
318
- 0.35000000000000003,
319
- 0.4,
320
- 0.4,
321
- 0.25,
322
- 0.25,
323
- 0.35000000000000003,
324
- 0.4,
325
- 0.35000000000000003,
326
- 0.30000000000000004,
327
- 0.4,
328
- 0.35000000000000003,
329
- 0.35000000000000003,
330
- 0.35000000000000003,
331
- 0.4,
332
- 0.35000000000000003,
333
- 0.35000000000000003,
334
- 0.2,
335
- 0.35000000000000003,
336
- 0.4,
337
- 0.35000000000000003,
338
- 0.42500000000000004,
339
- 0.4,
340
- 0.30000000000000004,
341
- 0.4,
342
- 0.4,
343
- 0.42500000000000004,
344
- 0.42500000000000004,
345
- 0.4,
346
- 0.42500000000000004,
347
- 0.4,
348
- 0.4,
349
- 0.35000000000000003,
350
- 0.42500000000000004,
351
- 0.30000000000000004,
352
- 0.42500000000000004,
353
- 0.4,
354
- 0.4,
355
- 0.4,
356
- 0.42500000000000004,
357
- 0.4,
358
- 0.35000000000000003,
359
- 0.4,
360
- 0.42500000000000004,
361
- 0.4,
362
- 0.42500000000000004,
363
- 0.25,
364
- 0.35000000000000003,
365
- 0.4,
366
- 0.4,
367
- 0.35000000000000003,
368
- 0.4,
369
- 0.4,
370
- 0.35000000000000003,
371
- 0.4,
372
- 0.4,
373
- 0.4,
374
- 0.4,
375
- 0.4,
376
- 0.4,
377
- 0.4,
378
- 0.42500000000000004,
379
- 0.4,
380
- 0.4,
381
- 0.4,
382
- 0.375,
383
- 0.4,
384
- 0.375,
385
- 0.4,
386
- 0.35000000000000003,
387
- 0.4,
388
- 0.4,
389
- 0.35000000000000003,
390
- 0.42500000000000004,
391
- 0.4,
392
- 0.4,
393
- 0.42500000000000004,
394
- 0.4,
395
- 0.4,
396
- 0.4,
397
- 0.4,
398
- 0.45,
399
- 0.4,
400
- 0.4,
401
- 0.4,
402
- 0.35000000000000003,
403
- 0.4,
404
- 0.4
405
- ],
406
- "bluff": [
407
- -0.5,
408
- -0.5,
409
- -0.5,
410
- -0.5,
411
- -0.5,
412
- -0.5,
413
- -0.5,
414
- -0.5,
415
- -0.5,
416
- -0.5,
417
- -0.5,
418
- -0.5,
419
- -0.5,
420
- -0.5,
421
- -0.5,
422
- -0.5,
423
- -0.5,
424
- -0.5,
425
- -0.5,
426
- -0.5,
427
- -0.5,
428
- -0.5,
429
- -0.5,
430
- -0.5,
431
- -0.5,
432
- -0.5,
433
- -0.5,
434
- -0.5,
435
- -0.5,
436
- -0.5,
437
- -0.5,
438
- -0.5,
439
- -0.5,
440
- -0.5,
441
- -0.5,
442
- -0.5,
443
- -0.5,
444
- -0.5,
445
- -0.5,
446
- -0.5,
447
- -0.5,
448
- -0.5,
449
- -0.5,
450
- -0.5,
451
- -0.5,
452
- -0.5,
453
- -0.5,
454
- -0.5,
455
- -0.5,
456
- -0.5,
457
- -0.5,
458
- -0.5,
459
- -0.5,
460
- -0.5,
461
- -0.5,
462
- -0.5,
463
- -0.5,
464
- -0.5,
465
- -0.5,
466
- -0.5,
467
- -0.5,
468
- -0.5,
469
- -0.5,
470
- -0.5,
471
- -0.5,
472
- -0.5,
473
- -0.5,
474
- -0.5,
475
- -0.5,
476
- -0.5,
477
- -0.5,
478
- -0.5,
479
- -0.5,
480
- -0.5,
481
- -0.5,
482
- -0.5,
483
- -0.5,
484
- -0.5,
485
- -0.5,
486
- -0.5,
487
- -0.5,
488
- -0.5,
489
- -0.5,
490
- -0.5,
491
- -0.5,
492
- -0.5,
493
- -0.5,
494
- -0.5,
495
- -0.5,
496
- -0.5,
497
- -0.5,
498
- -0.5,
499
- -0.5,
500
- -0.5,
501
- -0.5,
502
- -0.5,
503
- -0.5,
504
- -0.5,
505
- -0.5,
506
- -0.5,
507
- -0.5,
508
- -0.5,
509
- -0.5,
510
- -0.5,
511
- -0.5,
512
- -0.5,
513
- -0.5,
514
- -0.5,
515
- -0.5,
516
- -0.5,
517
- -0.5,
518
- -0.5,
519
- -0.5,
520
- -0.5,
521
- -0.5,
522
- -0.5,
523
- -0.5,
524
- -0.5,
525
- -0.5,
526
- -0.5,
527
- -0.5,
528
- -0.5,
529
- -0.5,
530
- -0.5,
531
- -0.5,
532
- -0.5,
533
- -0.5,
534
- -0.5,
535
- -0.5,
536
- -0.5,
537
- -0.5,
538
- -0.5,
539
- -0.5,
540
- -0.5,
541
- -0.5,
542
- -0.5,
543
- -0.5,
544
- -0.5,
545
- -0.5,
546
- -0.5,
547
- -0.5,
548
- -0.5,
549
- -0.5,
550
- -0.5,
551
- -0.5,
552
- -0.5,
553
- -0.5,
554
- -0.5,
555
- -0.5,
556
- -0.5,
557
- -0.5,
558
- -0.5,
559
- -0.5,
560
- -0.5,
561
- -0.5,
562
- -0.5,
563
- -0.5,
564
- -0.5,
565
- -0.5,
566
- -0.5,
567
- -0.5,
568
- -0.5,
569
- -0.5,
570
- -0.5,
571
- -0.5,
572
- -0.5,
573
- -0.5,
574
- -0.5,
575
- -0.5,
576
- -0.5,
577
- -0.5,
578
- -0.5,
579
- -0.5,
580
- -0.5,
581
- -0.5,
582
- -0.5,
583
- -0.5,
584
- -0.5,
585
- -0.5,
586
- -0.5,
587
- -0.5,
588
- -0.5,
589
- -0.5,
590
- -0.5,
591
- -0.5,
592
- -0.5,
593
- -0.5,
594
- -0.5,
595
- -0.5,
596
- -0.5,
597
- -0.5,
598
- -0.5,
599
- -0.5,
600
- -0.5,
601
- -0.5,
602
- -0.5,
603
- -0.5,
604
- -0.5,
605
- -0.5,
606
- -0.5
607
- ],
608
- "total": [
609
- -0.005632656662614553,
610
- 0.0035414209169150612,
611
- 0.0463746758290905,
612
- 0.01385253375163301,
613
- 0.031328346306857296,
614
- 0.040674931128175884,
615
- 0.06677631295486929,
616
- -0.011537435319447932,
617
- 0.03468104478688924,
618
- -0.0932011224345026,
619
- -0.08260391708664314,
620
- -0.09629359556157094,
621
- -0.11942611493017122,
622
- -0.03407811083553544,
623
- -0.04337514867381029,
624
- 0.018024352369366947,
625
- -0.02987761725151553,
626
- 0.012898143220688411,
627
- -0.04800340958200586,
628
- -0.022862798335995183,
629
- 0.026928943824520928,
630
- 0.03547137655147041,
631
- 0.05399387670745824,
632
- 0.018612216480322585,
633
- 0.044427697974415745,
634
- 0.043159290798953795,
635
- 0.027559473749035293,
636
- 0.02786162539728372,
637
- 0.05098413474659466,
638
- 0.045882923332894544,
639
- 0.04678534265247689,
640
- 0.018469937962007153,
641
- -0.007788363229483373,
642
- -0.05169003397620414,
643
- -0.04011293246002378,
644
- 0.03843552193927165,
645
- -0.018981149289757013,
646
- -0.05250152555632605,
647
- -0.039082535169034954,
648
- -0.04177710969844557,
649
- 0.033277242206897865,
650
- -0.015763996773210408,
651
- -0.045923419814928465,
652
- -0.02710253048252593,
653
- -0.019949592982828956,
654
- -0.006812863090804698,
655
- -0.007501429924711707,
656
- -0.007583835519778186,
657
- -0.008508277866801141,
658
- -0.05750618185768919,
659
- 0.012704452148615191,
660
- 0.0461489388693105,
661
- 0.036085870341615436,
662
- -0.023081275261207068,
663
- 0.04460296853635004,
664
- 0.03331110871096628,
665
- 0.04677108434582211,
666
- 0.0866117158942153,
667
- 0.04735613681958707,
668
- 0.02866440264175356,
669
- 0.0034510024291352186,
670
- 0.01889380078641028,
671
- -0.00754865502944687,
672
- 0.0037064969834886344,
673
- 0.0023488481950178913,
674
- -0.001155746987345354,
675
- 0.013926961959661782,
676
- 0.058517205705177766,
677
- -0.03356423260800931,
678
- 0.014039569095949535,
679
- 0.03657873184043532,
680
- 0.02756108034713149,
681
- 0.07789913175200697,
682
- -0.05717964064724733,
683
- 0.04979745162649989,
684
- 0.04677280890660393,
685
- -0.012559589230724939,
686
- -0.014978693045319853,
687
- 0.08019641550638473,
688
- 0.04248388862252848,
689
- -0.01374258722566403,
690
- 0.015617144591688177,
691
- 0.10370682955660918,
692
- 0.07351976337371936,
693
- 0.05835700970386343,
694
- 0.12767525705253094,
695
- 0.08792921374330046,
696
- 0.1502985076530879,
697
- 0.13790495264662364,
698
- 0.049617701509354614,
699
- 0.09545486082322892,
700
- 0.13561667222529647,
701
- 0.15626664100402804,
702
- 0.2014901048614205,
703
- 0.06172022660247342,
704
- 0.15472513072488783,
705
- 0.11861807659984457,
706
- 0.1708985193872415,
707
- 0.23975215573692582,
708
- 0.1418553493070782,
709
- 0.10251034552441629,
710
- 0.21074666040761822,
711
- 0.12829887908996535,
712
- 0.19373141601165192,
713
- 0.19131607801381584,
714
- 0.21619927348620369,
715
- 0.1483875755630696,
716
- 0.2108253595337488,
717
- 0.18997075157068588,
718
- 0.23583204035325128,
719
- 0.12538964530903712,
720
- 0.22176455869607747,
721
- 0.25498578672388406,
722
- 0.2348833734018327,
723
- 0.25806582245582543,
724
- 0.256725731254713,
725
- 0.217688538363558,
726
- 0.20351460171754027,
727
- 0.24518375851842128,
728
- 0.2721126733532626,
729
- 0.2048736034391988,
730
- 0.12875025128272513,
731
- 0.15179871677395568,
732
- 0.14889109667444517,
733
- 0.16575265245093296,
734
- 0.23958638210770317,
735
- 0.11772093469802442,
736
- 0.27499241019190734,
737
- 0.24106245321199898,
738
- 0.15997635155519643,
739
- 0.18963817106200198,
740
- 0.21255246315640697,
741
- 0.22016386806205945,
742
- 0.1571498944751054,
743
- 0.16245840416297436,
744
- 0.21236841849977267,
745
- 0.24637880707593643,
746
- 0.17501403412736427,
747
- 0.23932606804136433,
748
- 0.2633004917445247,
749
- 0.27089338076890623,
750
- 0.1878873565420508,
751
- 0.2738354979815073,
752
- 0.15086166763033024,
753
- 0.24292799433742218,
754
- 0.27187625294506645,
755
- 0.1514255665187168,
756
- 0.2327668918963592,
757
- 0.24157992388890587,
758
- 0.20029361849655403,
759
- 0.1706021493333163,
760
- 0.23369528187394348,
761
- 0.07824582919292578,
762
- 0.25009549255491953,
763
- 0.19197482771034816,
764
- 0.1273628585241226,
765
- 0.25365611727708337,
766
- 0.19046527750270448,
767
- 0.25295570899992886,
768
- 0.24360827055045886,
769
- 0.1805676917625157,
770
- 0.08987712895617675,
771
- 0.25313033857647255,
772
- 0.25369405846780374,
773
- 0.2762586252964453,
774
- 0.24169864155519138,
775
- 0.2512274566158596,
776
- 0.25901052969138366,
777
- 0.24697675590015272,
778
- 0.10277350862606237,
779
- 0.1577947883931887,
780
- 0.2408820229321641,
781
- 0.2602915784887538,
782
- 0.1839677441985179,
783
- 0.2519962657939911,
784
- 0.19763827845486265,
785
- 0.18770314179573322,
786
- 0.1810778460773638,
787
- 0.26132975807617365,
788
- 0.1999569808530261,
789
- 0.1806366594135761,
790
- 0.2540627996863101,
791
- 0.28933708976419703,
792
- 0.18026429030550906,
793
- 0.2904810922007262,
794
- 0.10900117908957782,
795
- 0.2319824642120985,
796
- 0.17902790839864524,
797
- 0.2105263596677992,
798
- 0.26952767922357546,
799
- 0.27708851058699985,
800
- 0.23487814321254327,
801
- 0.2809620169381768,
802
- 0.1758095718985027,
803
- 0.2457133999021494,
804
- 0.1670511013779171,
805
- 0.21533313785916613,
806
- 0.2482741084502737,
807
- 0.2817973929436147,
808
- 0.1781025595350114
809
- ]
810
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
training/unified_training.log DELETED
@@ -1,269 +0,0 @@
1
- Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
2
- Loading Phase 2 checkpoint...
3
-
4
- BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
5
- Key | Status | |
6
- ------------------------+------------+--+-
7
- embeddings.position_ids | UNEXPECTED | |
8
-
9
- Notes:
10
- - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
11
- Building training dataset from selfplay states...
12
-
13
- Starting unified training...
14
- Loading from: training/checkpoints/phase2_final
15
- Saving to: training/checkpoints/unified_final
16
- ==================================================
17
-
18
  0%| | 0/200 [00:00<?, ?it/s]Passing `generation_config` together with generation-related arguments=({'disable_compile'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
19
- Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
20
-
21
-
22
- DistilBertModel LOAD REPORT from: distilbert-base-uncased
23
- Key | Status | |
24
- ------------------------+------------+--+-
25
- vocab_projector.bias | UNEXPECTED | |
26
- vocab_layer_norm.weight | UNEXPECTED | |
27
- vocab_layer_norm.bias | UNEXPECTED | |
28
- vocab_transform.weight | UNEXPECTED | |
29
- vocab_transform.bias | UNEXPECTED | |
30
-
31
- Notes:
32
- - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
33
-
34
  0%| | 1/200 [00:14<48:31, 14.63s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
35
-
36
  1%| | 2/200 [00:25<40:55, 12.40s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
37
-
38
  2%|▏ | 3/200 [00:37<40:52, 12.45s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
39
-
40
  2%|▏ | 4/200 [00:50<41:02, 12.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
41
-
42
  2%|▎ | 5/200 [01:03<40:57, 12.60s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
43
-
44
  3%|▎ | 6/200 [01:14<39:37, 12.26s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
45
-
46
  4%|▎ | 7/200 [01:27<39:44, 12.36s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
47
-
48
  4%|▍ | 8/200 [01:39<38:51, 12.14s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
49
-
50
  4%|▍ | 9/200 [01:50<37:33, 11.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
51
-
52
  5%|▌ | 10/200 [02:01<36:55, 11.66s/it]
53
 
54
  5%|▌ | 10/200 [02:01<36:55, 11.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
55
-
56
  6%|▌ | 11/200 [02:14<37:32, 11.92s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
57
-
58
  6%|▌ | 12/200 [02:25<36:31, 11.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
59
-
60
  6%|▋ | 13/200 [02:36<36:00, 11.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
61
-
62
  7%|▋ | 14/200 [02:47<35:24, 11.42s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
63
-
64
  8%|▊ | 15/200 [03:00<36:25, 11.81s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
65
-
66
  8%|▊ | 16/200 [03:12<36:28, 11.89s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
67
-
68
  8%|▊ | 17/200 [03:25<37:13, 12.20s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
69
-
70
  9%|▉ | 18/200 [03:37<36:45, 12.12s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
71
-
72
  10%|▉ | 19/200 [03:48<36:05, 11.97s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
73
-
74
  10%|█ | 20/200 [04:01<36:13, 12.07s/it]
75
 
76
  10%|█ | 20/200 [04:01<36:13, 12.07s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
77
-
78
  10%|█ | 21/200 [04:14<36:57, 12.39s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
79
-
80
  11%|█ | 22/200 [04:25<35:33, 11.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
81
-
82
  12%|█▏ | 23/200 [04:37<35:37, 12.07s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
83
-
84
  12%|█▏ | 24/200 [04:50<35:42, 12.18s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
85
-
86
  12%|█▎ | 25/200 [05:00<34:15, 11.75s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
87
-
88
  13%|█▎ | 26/200 [05:14<35:27, 12.23s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
89
-
90
  14%|█▎ | 27/200 [05:25<34:26, 11.95s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
91
-
92
  14%|█▍ | 28/200 [05:36<33:26, 11.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
93
-
94
  14%|█▍ | 29/200 [05:49<34:30, 12.11s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
95
-
96
  15%|█▌ | 30/200 [06:02<34:38, 12.23s/it]
97
 
98
  15%|█▌ | 30/200 [06:02<34:38, 12.23s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
99
-
100
  16%|█▌ | 31/200 [06:13<33:24, 11.86s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
101
-
102
  16%|█▌ | 32/200 [06:25<33:57, 12.13s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
103
-
104
  16%|█▋ | 33/200 [06:38<34:16, 12.31s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
105
-
106
  17%|█▋ | 34/200 [06:49<33:09, 11.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
107
-
108
  18%|█▊ | 35/200 [07:00<32:13, 11.72s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
109
-
110
  18%|█▊ | 36/200 [07:13<33:05, 12.11s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
111
-
112
  18%|█▊ | 37/200 [07:24<32:02, 11.79s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
113
-
114
  19%|█▉ | 38/200 [07:38<33:23, 12.37s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
115
-
116
  20%|█▉ | 39/200 [07:52<34:03, 12.70s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
117
-
118
  20%|██ | 40/200 [08:03<32:28, 12.18s/it]
119
 
120
  20%|██ | 40/200 [08:03<32:28, 12.18s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
121
-
122
  20%|██ | 41/200 [08:15<32:11, 12.15s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
123
-
124
  21%|██ | 42/200 [08:26<31:37, 12.01s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
125
-
126
  22%|██▏ | 43/200 [08:37<30:27, 11.64s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
127
-
128
  22%|██▏ | 44/200 [08:50<31:14, 12.01s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
129
-
130
  22%|██▎ | 45/200 [09:03<31:47, 12.30s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
131
-
132
  23%|██▎ | 46/200 [09:15<31:12, 12.16s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
133
-
134
  24%|██▎ | 47/200 [09:27<30:57, 12.14s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
135
-
136
  24%|██▍ | 48/200 [09:38<30:09, 11.90s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
137
-
138
  24%|██▍ | 49/200 [09:52<31:30, 12.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
139
-
140
  25%|██▌ | 50/200 [10:03<30:13, 12.09s/it]
141
 
142
  25%|██▌ | 50/200 [10:04<30:13, 12.09s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
143
-
144
  26%|██▌ | 51/200 [10:18<31:43, 12.77s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
145
-
146
  26%|██▌ | 52/200 [10:30<31:11, 12.64s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
147
-
148
  26%|██▋ | 53/200 [10:42<30:30, 12.45s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
149
-
150
  27%|██▋ | 54/200 [10:56<31:35, 12.98s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
151
-
152
  28%|██▊ | 55/200 [11:10<31:37, 13.09s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
153
-
154
  28%|██▊ | 56/200 [11:22<30:41, 12.79s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
155
-
156
  28%|██▊ | 57/200 [11:33<29:33, 12.41s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
157
-
158
  29%|██▉ | 58/200 [11:46<29:53, 12.63s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
159
-
160
  30%|██▉ | 59/200 [11:59<29:27, 12.53s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
161
-
162
  30%|███ | 60/200 [12:10<28:30, 12.22s/it]
163
 
164
  30%|███ | 60/200 [12:10<28:30, 12.22s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
165
-
166
  30%|███ | 61/200 [12:22<27:57, 12.07s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
167
-
168
  31%|███ | 62/200 [12:35<28:26, 12.37s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
169
-
170
  32%|███▏ | 63/200 [12:48<28:43, 12.58s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
171
-
172
  32%|███▏ | 64/200 [12:59<27:41, 12.21s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
173
-
174
  32%|███▎ | 65/200 [13:12<27:59, 12.44s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
175
-
176
  33%|███▎ | 66/200 [13:25<27:39, 12.39s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
177
-
178
  34%|███▎ | 67/200 [13:37<27:13, 12.28s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
179
-
180
  34%|███▍ | 68/200 [13:49<27:13, 12.37s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
181
-
182
  34%|███▍ | 69/200 [14:00<26:03, 11.94s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
183
-
184
  35%|███▌ | 70/200 [14:11<25:18, 11.68s/it]
185
 
186
  35%|███▌ | 70/200 [14:11<25:18, 11.68s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
187
-
188
  36%|███▌ | 71/200 [14:24<25:42, 11.95s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
189
-
190
  36%|███▌ | 72/200 [14:35<24:53, 11.67s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
191
-
192
  36%|███▋ | 73/200 [14:48<25:25, 12.01s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
193
-
194
  37%|███▋ | 74/200 [14:58<24:11, 11.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
195
-
196
  38%|███▊ | 75/200 [15:11<24:38, 11.83s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
197
-
198
  38%|███▊ | 76/200 [15:22<24:14, 11.73s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
199
-
200
  38%|███▊ | 77/200 [15:33<23:19, 11.38s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
201
-
202
  39%|███▉ | 78/200 [15:44<23:00, 11.31s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
203
-
204
  40%|███▉ | 79/200 [15:57<23:41, 11.75s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
205
-
206
  40%|████ | 80/200 [16:08<23:32, 11.77s/it]
207
 
208
  40%|████ | 80/200 [16:09<23:32, 11.77s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
209
-
210
  40%|████ | 81/200 [16:20<23:29, 11.84s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
211
-
212
  41%|████ | 82/200 [16:32<23:06, 11.75s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
213
-
214
  42%|████▏ | 83/200 [16:43<22:32, 11.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
215
-
216
  42%|████▏ | 84/200 [16:55<22:24, 11.59s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
217
-
218
  42%|████▎ | 85/200 [17:06<22:05, 11.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
219
-
220
  43%|████▎ | 86/200 [17:18<22:11, 11.68s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
221
-
222
  44%|████▎ | 87/200 [17:30<22:11, 11.79s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
223
-
224
  44%|████▍ | 88/200 [17:42<21:53, 11.73s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
225
-
226
  44%|████▍ | 89/200 [17:53<21:22, 11.55s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
227
-
228
  45%|████▌ | 90/200 [18:04<21:07, 11.52s/it]
229
 
230
  45%|████▌ | 90/200 [18:05<21:07, 11.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
231
-
232
  46%|████▌ | 91/200 [18:17<21:21, 11.76s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
233
-
234
  46%|████▌ | 92/200 [18:30<21:45, 12.09s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
235
-
236
  46%|████▋ | 93/200 [18:41<21:18, 11.95s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
237
-
238
  47%|████▋ | 94/200 [18:52<20:46, 11.76s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
239
-
240
  48%|████▊ | 95/200 [19:03<20:08, 11.51s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
241
-
242
  48%|████▊ | 96/200 [19:15<20:08, 11.62s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
243
-
244
  48%|████▊ | 97/200 [19:28<20:36, 12.01s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
245
-
246
  49%|████▉ | 98/200 [19:41<20:57, 12.33s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
247
-
248
  50%|████▉ | 99/200 [19:54<21:03, 12.51s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
249
-
250
  50%|█████ | 100/200 [20:05<20:03, 12.04s/it]
251
 
252
  50%|█████ | 100/200 [20:05<20:03, 12.04s/it]{'loss': '3.099e-07', 'grad_norm': '0.7345', 'learning_rate': '4.775e-06', 'num_tokens': '3.58e+04', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.01269', 'rewards/compute_reward/std': '0.02462', 'reward': '0.01269', 'reward_std': '0.02462', 'frac_reward_zero_std': '0', 'entropy': '1.357', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.71', 'epoch': '0.01'}
253
- {'loss': '-4.247e-07', 'grad_norm': '4.442', 'learning_rate': '4.525e-06', 'num_tokens': '7.175e+04', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '-0.04456', 'rewards/compute_reward/std': '0.03567', 'reward': '-0.04456', 'reward_std': '0.03567', 'frac_reward_zero_std': '0', 'entropy': '1.317', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.64', 'epoch': '0.02'}
254
- {'loss': '-0.001816', 'grad_norm': '3.017', 'learning_rate': '4.275e-06', 'num_tokens': '1.082e+05', 'completions/mean_length': '99.86', 'completions/min_length': '98.9', 'completions/max_length': '100', 'completions/clipped_ratio': '0.9875', 'completions/mean_terminated_length': '8.9', 'completions/min_terminated_length': '8.9', 'completions/max_terminated_length': '8.9', 'rewards/compute_reward/mean': '0.03749', 'rewards/compute_reward/std': '0.02062', 'reward': '0.03749', 'reward_std': '0.02062', 'frac_reward_zero_std': '0', 'entropy': '1.006', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.76', 'epoch': '0.03'}
255
- {'loss': '-0.006361', 'grad_norm': '5.866', 'learning_rate': '4.025e-06', 'num_tokens': '1.434e+05', 'completions/mean_length': '99.58', 'completions/min_length': '96.6', 'completions/max_length': '100', 'completions/clipped_ratio': '0.9875', 'completions/mean_terminated_length': '6.6', 'completions/min_terminated_length': '6.6', 'completions/max_terminated_length': '6.6', 'rewards/compute_reward/mean': '-0.01482', 'rewards/compute_reward/std': '0.067', 'reward': '-0.01482', 'reward_std': '0.067', 'frac_reward_zero_std': '0', 'entropy': '1.782', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.79', 'epoch': '0.04'}
256
- {'loss': '-0.01103', 'grad_norm': '6.191', 'learning_rate': '3.775e-06', 'num_tokens': '1.789e+05', 'completions/mean_length': '99.12', 'completions/min_length': '93', 'completions/max_length': '100', 'completions/clipped_ratio': '0.9875', 'completions/mean_terminated_length': '3', 'completions/min_terminated_length': '3', 'completions/max_terminated_length': '3', 'rewards/compute_reward/mean': '-0.01634', 'rewards/compute_reward/std': '0.05182', 'reward': '-0.01634', 'reward_std': '0.05182', 'frac_reward_zero_std': '0', 'entropy': '2.131', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.74', 'epoch': '0.05'}
257
- {'loss': '0.001951', 'grad_norm': '8.536', 'learning_rate': '3.525e-06', 'num_tokens': '2.163e+05', 'completions/mean_length': '99.85', 'completions/min_length': '98.8', 'completions/max_length': '100', 'completions/clipped_ratio': '0.9875', 'completions/mean_terminated_length': '8.8', 'completions/min_terminated_length': '8.8', 'completions/max_terminated_length': '8.8', 'rewards/compute_reward/mean': '0.03592', 'rewards/compute_reward/std': '0.04931', 'reward': '0.03592', 'reward_std': '0.04931', 'frac_reward_zero_std': '0', 'entropy': '2.104', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '11.27', 'epoch': '0.06'}
258
- {'loss': '-0.02075', 'grad_norm': '6.919', 'learning_rate': '3.275e-06', 'num_tokens': '2.511e+05', 'completions/mean_length': '92.96', 'completions/min_length': '70.1', 'completions/max_length': '100', 'completions/clipped_ratio': '0.8', 'completions/mean_terminated_length': '38.51', 'completions/min_terminated_length': '30.1', 'completions/max_terminated_length': '48.2', 'rewards/compute_reward/mean': '0.007262', 'rewards/compute_reward/std': '0.08027', 'reward': '0.007262', 'reward_std': '0.08027', 'frac_reward_zero_std': '0', 'entropy': '1.647', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.77', 'epoch': '0.07'}
259
- {'loss': '0.00825', 'grad_norm': '4.918', 'learning_rate': '3.025e-06', 'num_tokens': '2.857e+05', 'completions/mean_length': '99.14', 'completions/min_length': '93.1', 'completions/max_length': '100', 'completions/clipped_ratio': '0.9875', 'completions/mean_terminated_length': '3.1', 'completions/min_terminated_length': '3.1', 'completions/max_terminated_length': '3.1', 'rewards/compute_reward/mean': '0.02766', 'rewards/compute_reward/std': '0.0484', 'reward': '0.02766', 'reward_std': '0.0484', 'frac_reward_zero_std': '0', 'entropy': '2.234', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.43', 'epoch': '0.08'}
260
- {'loss': '2.217e-08', 'grad_norm': '4.417', 'learning_rate': '2.775e-06', 'num_tokens': '3.202e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.07909', 'rewards/compute_reward/std': '0.07921', 'reward': '0.07909', 'reward_std': '0.07921', 'frac_reward_zero_std': '0', 'entropy': '1.806', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.34', 'epoch': '0.09'}
261
- {'loss': '8.494e-08', 'grad_norm': '3.353', 'learning_rate': '2.525e-06', 'num_tokens': '3.554e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.1476', 'rewards/compute_reward/std': '0.07424', 'reward': '0.1476', 'reward_std': '0.07424', 'frac_reward_zero_std': '0', 'entropy': '1.406', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.75', 'epoch': '0.1'}
262
-
263
-
264
-
265
- Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
266
-
267
  50%|█████ | 101/200 [20:36<29:22, 17.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
268
-
269
  51%|█████ | 102/200 [20:50<26:49, 16.42s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
270
-
271
  52%|█████▏ | 103/200 [21:01<24:14, 14.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
272
-
273
  52%|█████▏ | 104/200 [21:14<23:08, 14.47s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
274
-
275
  52%|█████▎ | 105/200 [21:28<22:16, 14.06s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
276
-
277
  53%|█████▎ | 106/200 [21:40<21:29, 13.72s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
278
-
279
  54%|█████▎ | 107/200 [21:52<20:09, 13.00s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
280
-
281
  54%|█████▍ | 108/200 [22:06<20:24, 13.31s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
282
-
283
  55%|█████▍ | 109/200 [22:17<19:09, 12.63s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
284
-
285
  55%|█████▌ | 110/200 [22:31<19:44, 13.17s/it]
286
 
287
  55%|█████▌ | 110/200 [22:32<19:44, 13.17s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
288
-
289
  56%|█████▌ | 111/200 [22:44<19:22, 13.06s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
290
-
291
  56%|█████▌ | 112/200 [22:56<18:50, 12.85s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
292
-
293
  56%|█████▋ | 113/200 [23:10<18:49, 12.98s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
294
-
295
  57%|█████▋ | 114/200 [23:21<18:04, 12.61s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
296
-
297
  57%|█████▊ | 115/200 [23:34<17:45, 12.54s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
298
-
299
  58%|█████▊ | 116/200 [23:46<17:21, 12.40s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
300
-
301
  58%|█████▊ | 117/200 [23:59<17:25, 12.59s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
302
-
303
  59%|█████▉ | 118/200 [24:13<17:38, 12.91s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
304
-
305
  60%|█████▉ | 119/200 [24:25<17:23, 12.89s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
306
-
307
  60%|██████ | 120/200 [24:38<16:57, 12.71s/it]
308
 
309
  60%|██████ | 120/200 [24:38<16:57, 12.71s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
310
-
311
  60%|██████ | 121/200 [24:52<17:12, 13.06s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
312
-
313
  61%|██████ | 122/200 [25:04<16:44, 12.88s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
314
-
315
  62%|██████▏ | 123/200 [25:14<15:30, 12.08s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
316
-
317
  62%|██████▏ | 124/200 [25:27<15:30, 12.25s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
318
-
319
  62%|██████▎ | 125/200 [25:40<15:28, 12.38s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
320
-
321
  63%|██████▎ | 126/200 [25:52<15:14, 12.36s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
322
-
323
  64%|██████▎ | 127/200 [26:03<14:31, 11.94s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
324
-
325
  64%|██████▍ | 128/200 [26:17<15:03, 12.55s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
326
-
327
  64%|██████▍ | 129/200 [26:29<14:45, 12.47s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
328
-
329
  65%|██████▌ | 130/200 [26:42<14:45, 12.64s/it]
330
 
331
  65%|██████▌ | 130/200 [26:42<14:45, 12.64s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
332
-
333
  66%|██████▌ | 131/200 [26:54<14:14, 12.38s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
334
-
335
  66%|██████▌ | 132/200 [27:05<13:30, 11.92s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
336
-
337
  66%|██████▋ | 133/200 [27:16<13:12, 11.83s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
338
-
339
  67%|██████▋ | 134/200 [27:28<12:46, 11.61s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
340
-
341
  68%|██████▊ | 135/200 [27:39<12:32, 11.58s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
342
-
343
  68%|██████▊ | 136/200 [27:51<12:37, 11.83s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
344
-
345
  68%|██████▊ | 137/200 [28:04<12:32, 11.95s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
346
-
347
  69%|██████▉ | 138/200 [28:18<13:00, 12.59s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
348
-
349
  70%|██████▉ | 139/200 [28:31<12:56, 12.73s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
350
-
351
  70%|███████ | 140/200 [28:44<12:51, 12.86s/it]
352
 
353
  70%|███████ | 140/200 [28:44<12:51, 12.86s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
354
-
355
  70%|███████ | 141/200 [28:56<12:23, 12.60s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
356
-
357
  71%|███████ | 142/200 [29:08<11:56, 12.36s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
358
-
359
  72%|███████▏ | 143/200 [29:20<11:38, 12.26s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
360
-
361
  72%|███████▏ | 144/200 [29:31<11:01, 11.81s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
362
-
363
  72%|███████▎ | 145/200 [29:43<10:59, 11.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
364
-
365
  73%|███████▎ | 146/200 [29:56<11:09, 12.40s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
366
-
367
  74%|███████▎ | 147/200 [30:10<11:22, 12.88s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
368
-
369
  74%|███████▍ | 148/200 [30:22<10:51, 12.53s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
370
-
371
  74%|███████▍ | 149/200 [30:34<10:23, 12.22s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
372
-
373
  75%|███████▌ | 150/200 [30:46<10:13, 12.27s/it]
374
 
375
  75%|███████▌ | 150/200 [30:46<10:13, 12.27s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
376
-
377
  76%|███████▌ | 151/200 [30:57<09:49, 12.03s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
378
-
379
  76%|███████▌ | 152/200 [31:09<09:33, 11.96s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
380
-
381
  76%|███████▋ | 153/200 [31:20<09:04, 11.58s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
382
-
383
  77%|███████▋ | 154/200 [31:32<09:02, 11.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
384
-
385
  78%|███████▊ | 155/200 [31:44<08:56, 11.91s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
386
-
387
  78%|███████▊ | 156/200 [31:55<08:32, 11.65s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
388
-
389
  78%|███████▊ | 157/200 [32:08<08:35, 11.98s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
390
-
391
  79%|███████▉ | 158/200 [32:20<08:18, 11.87s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
392
-
393
  80%|███████▉ | 159/200 [32:33<08:25, 12.32s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
394
-
395
  80%|████████ | 160/200 [32:45<08:10, 12.25s/it]
396
 
397
  80%|████████ | 160/200 [32:45<08:10, 12.25s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
398
-
399
  80%|████████ | 161/200 [32:56<07:40, 11.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
400
-
401
  81%|████████ | 162/200 [33:07<07:19, 11.58s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
402
-
403
  82%|████████▏ | 163/200 [33:20<07:25, 12.03s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
404
-
405
  82%|████████▏ | 164/200 [33:33<07:24, 12.35s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
406
-
407
  82%|████████▎ | 165/200 [33:47<07:25, 12.74s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
408
-
409
  83%|████████▎ | 166/200 [33:59<07:10, 12.68s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
410
-
411
  84%|████████▎ | 167/200 [34:13<07:02, 12.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
412
-
413
  84%|████████▍ | 168/200 [34:24<06:36, 12.39s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
414
-
415
  84%|████████▍ | 169/200 [34:37<06:28, 12.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
416
-
417
  85%|████████▌ | 170/200 [34:48<06:01, 12.06s/it]
418
 
419
  85%|████████▌ | 170/200 [34:48<06:01, 12.06s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
420
-
421
  86%|████████▌ | 171/200 [34:59<05:41, 11.76s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
422
-
423
  86%|████████▌ | 172/200 [35:11<05:29, 11.77s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
424
-
425
  86%|████████▋ | 173/200 [35:22<05:14, 11.65s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
426
-
427
  87%|████████▋ | 174/200 [35:34<05:03, 11.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
428
-
429
  88%|████████▊ | 175/200 [35:45<04:52, 11.71s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
430
-
431
  88%|████████▊ | 176/200 [35:59<04:50, 12.11s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
432
-
433
  88%|████████▊ | 177/200 [36:12<04:45, 12.42s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
434
-
435
  89%|████████▉ | 178/200 [36:24<04:33, 12.43s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
436
-
437
  90%|████████▉ | 179/200 [36:37<04:21, 12.45s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
438
-
439
  90%|█████████ | 180/200 [36:50<04:12, 12.63s/it]
440
 
441
  90%|█████████ | 180/200 [36:50<04:12, 12.63s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
442
-
443
  90%|█████████ | 181/200 [37:01<03:51, 12.19s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
444
-
445
  91%|█████████ | 182/200 [37:15<03:47, 12.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
446
-
447
  92%|█████████▏| 183/200 [37:27<03:33, 12.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
448
-
449
  92%|█████████▏| 184/200 [37:39<03:17, 12.34s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
450
-
451
  92%|█████████▎| 185/200 [37:51<03:04, 12.29s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
452
-
453
  93%|█████████▎| 186/200 [38:02<02:45, 11.82s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
454
-
455
  94%|█████████▎| 187/200 [38:13<02:31, 11.62s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
456
-
457
  94%|█████████▍| 188/200 [38:24<02:17, 11.49s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
458
-
459
  94%|█████████▍| 189/200 [38:35<02:05, 11.40s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
460
-
461
  95%|█████████▌| 190/200 [38:47<01:55, 11.53s/it]
462
 
463
  95%|█████████▌| 190/200 [38:47<01:55, 11.53s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
464
-
465
  96%|█████████▌| 191/200 [38:58<01:43, 11.49s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
466
-
467
  96%|█████████▌| 192/200 [39:10<01:31, 11.39s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
468
-
469
  96%|█████████▋| 193/200 [39:23<01:23, 11.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
470
-
471
  97%|█████████▋| 194/200 [39:34<01:10, 11.79s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
472
-
473
  98%|█████████▊| 195/200 [39:46<00:59, 11.85s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
474
-
475
  98%|█████████▊| 196/200 [39:57<00:46, 11.55s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
476
-
477
  98%|█████████▊| 197/200 [40:08<00:34, 11.34s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
478
-
479
  99%|█████████▉| 198/200 [40:20<00:23, 11.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
480
-
481
-
482
 
483
- {'loss': '1.505e-07', 'grad_norm': '2.707', 'learning_rate': '2.025e-06', 'num_tokens': '4.305e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.229', 'rewards/compute_reward/std': '0.0403', 'reward': '0.229', 'reward_std': '0.0403', 'frac_reward_zero_std': '0', 'entropy': '0.8805', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '11.24', 'epoch': '0.12'}
484
- {'loss': '1.222e-07', 'grad_norm': '3.943', 'learning_rate': '1.775e-06', 'num_tokens': '4.672e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.1833', 'rewards/compute_reward/std': '0.07255', 'reward': '0.1833', 'reward_std': '0.07255', 'frac_reward_zero_std': '0', 'entropy': '0.8755', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '11.07', 'epoch': '0.13'}
485
- {'loss': '-1.401e-07', 'grad_norm': '4.041', 'learning_rate': '1.525e-06', 'num_tokens': '5.03e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.2078', 'rewards/compute_reward/std': '0.06581', 'reward': '0.2078', 'reward_std': '0.06581', 'frac_reward_zero_std': '0', 'entropy': '0.9737', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.84', 'epoch': '0.14'}
486
- {'loss': '2.086e-08', 'grad_norm': '3.721', 'learning_rate': '1.275e-06', 'num_tokens': '5.398e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.2224', 'rewards/compute_reward/std': '0.05879', 'reward': '0.2224', 'reward_std': '0.05879', 'frac_reward_zero_std': '0', 'entropy': '0.9901', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.85', 'epoch': '0.15'}
487
- {'loss': '1.46e-07', 'grad_norm': '3.453', 'learning_rate': '1.025e-06', 'num_tokens': '5.754e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.1993', 'rewards/compute_reward/std': '0.06031', 'reward': '0.1993', 'reward_std': '0.06031', 'frac_reward_zero_std': '0', 'entropy': '1.121', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.62', 'epoch': '0.16'}
488
- {'loss': '0.0005226', 'grad_norm': '3.998', 'learning_rate': '7.75e-07', 'num_tokens': '6.12e+05', 'completions/mean_length': '99.81', 'completions/min_length': '98.5', 'completions/max_length': '100', 'completions/clipped_ratio': '0.9875', 'completions/mean_terminated_length': '8.5', 'completions/min_terminated_length': '8.5', 'completions/max_terminated_length': '8.5', 'rewards/compute_reward/mean': '0.2155', 'rewards/compute_reward/std': '0.03223', 'reward': '0.2155', 'reward_std': '0.03223', 'frac_reward_zero_std': '0', 'entropy': '0.9432', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.9', 'epoch': '0.17'}
489
- {'loss': '-8.27e-08', 'grad_norm': '3.733', 'learning_rate': '5.25e-07', 'num_tokens': '6.473e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.2123', 'rewards/compute_reward/std': '0.06549', 'reward': '0.2123', 'reward_std': '0.06549', 'frac_reward_zero_std': '0', 'entropy': '0.9799', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.85', 'epoch': '0.18'}
490
- {'loss': '-2.153e-07', 'grad_norm': '3.715', 'learning_rate': '2.75e-07', 'num_tokens': '6.82e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.2195', 'rewards/compute_reward/std': '0.05461', 'reward': '0.2195', 'reward_std': '0.05461', 'frac_reward_zero_std': '0', 'entropy': '0.9462', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.46', 'epoch': '0.19'}
491
- {'loss': '1.043e-08', 'grad_norm': '3.797', 'learning_rate': '2.5e-08', 'num_tokens': '7.167e+05', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.2305', 'rewards/compute_reward/std': '0.0388', 'reward': '0.2305', 'reward_std': '0.0388', 'frac_reward_zero_std': '0', 'entropy': '0.8442', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.34', 'epoch': '0.2'}
492
-
493
-
494
-
495
-
496
 
497
- {'train_runtime': '2458', 'train_samples_per_second': '0.651', 'train_steps_per_second': '0.081', 'train_loss': '-0.001462', 'epoch': '0.2'}
498
-
499
- Unified model saved to training/checkpoints/unified_final
500
- Reward curve saved to training/unified_reward_curve.png
501
-
502
- Final reward values (last 20 steps):
503
- accuracy: 0.7212
504
- outcome: 0.3800
505
- bluff: -0.5000
506
- total: 0.2354
507
-
508
- Unified training complete.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/200 [00:00<?, ?it/s]Passing `generation_config` together with generation-related arguments=({'disable_compile'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  0%| | 1/200 [00:14<48:31, 14.63s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
2
  1%| | 2/200 [00:25<40:55, 12.40s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
3
  2%|▏ | 3/200 [00:37<40:52, 12.45s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
4
  2%|▏ | 4/200 [00:50<41:02, 12.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
5
  2%|▎ | 5/200 [01:03<40:57, 12.60s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
6
  3%|▎ | 6/200 [01:14<39:37, 12.26s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
7
  4%|▎ | 7/200 [01:27<39:44, 12.36s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
8
  4%|▍ | 8/200 [01:39<38:51, 12.14s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
9
  4%|▍ | 9/200 [01:50<37:33, 11.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
10
  5%|▌ | 10/200 [02:01<36:55, 11.66s/it]
11
 
12
  5%|▌ | 10/200 [02:01<36:55, 11.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
13
  6%|▌ | 11/200 [02:14<37:32, 11.92s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
14
  6%|▌ | 12/200 [02:25<36:31, 11.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
15
  6%|▋ | 13/200 [02:36<36:00, 11.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
16
  7%|▋ | 14/200 [02:47<35:24, 11.42s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
17
  8%|▊ | 15/200 [03:00<36:25, 11.81s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
18
  8%|▊ | 16/200 [03:12<36:28, 11.89s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
19
  8%|▊ | 17/200 [03:25<37:13, 12.20s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
20
  9%|▉ | 18/200 [03:37<36:45, 12.12s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
21
  10%|▉ | 19/200 [03:48<36:05, 11.97s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
22
  10%|█ | 20/200 [04:01<36:13, 12.07s/it]
23
 
24
  10%|█ | 20/200 [04:01<36:13, 12.07s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
25
  10%|█ | 21/200 [04:14<36:57, 12.39s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
26
  11%|█ | 22/200 [04:25<35:33, 11.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
27
  12%|█▏ | 23/200 [04:37<35:37, 12.07s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
28
  12%|█▏ | 24/200 [04:50<35:42, 12.18s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
29
  12%|█▎ | 25/200 [05:00<34:15, 11.75s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
30
  13%|█▎ | 26/200 [05:14<35:27, 12.23s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
31
  14%|█▎ | 27/200 [05:25<34:26, 11.95s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
32
  14%|█▍ | 28/200 [05:36<33:26, 11.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
33
  14%|█▍ | 29/200 [05:49<34:30, 12.11s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
34
  15%|█▌ | 30/200 [06:02<34:38, 12.23s/it]
35
 
36
  15%|█▌ | 30/200 [06:02<34:38, 12.23s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
37
  16%|█▌ | 31/200 [06:13<33:24, 11.86s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
38
  16%|█▌ | 32/200 [06:25<33:57, 12.13s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
39
  16%|█▋ | 33/200 [06:38<34:16, 12.31s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
40
  17%|█▋ | 34/200 [06:49<33:09, 11.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
41
  18%|█▊ | 35/200 [07:00<32:13, 11.72s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
42
  18%|█▊ | 36/200 [07:13<33:05, 12.11s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
43
  18%|█▊ | 37/200 [07:24<32:02, 11.79s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
44
  19%|█▉ | 38/200 [07:38<33:23, 12.37s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
45
  20%|█▉ | 39/200 [07:52<34:03, 12.70s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
46
  20%|██ | 40/200 [08:03<32:28, 12.18s/it]
47
 
48
  20%|██ | 40/200 [08:03<32:28, 12.18s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
49
  20%|██ | 41/200 [08:15<32:11, 12.15s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
50
  21%|██ | 42/200 [08:26<31:37, 12.01s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
51
  22%|██▏ | 43/200 [08:37<30:27, 11.64s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
52
  22%|██▏ | 44/200 [08:50<31:14, 12.01s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
53
  22%|██▎ | 45/200 [09:03<31:47, 12.30s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
54
  23%|██▎ | 46/200 [09:15<31:12, 12.16s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
55
  24%|██▎ | 47/200 [09:27<30:57, 12.14s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
56
  24%|██▍ | 48/200 [09:38<30:09, 11.90s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
57
  24%|██▍ | 49/200 [09:52<31:30, 12.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
58
  25%|██▌ | 50/200 [10:03<30:13, 12.09s/it]
59
 
60
  25%|██▌ | 50/200 [10:04<30:13, 12.09s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
61
  26%|██▌ | 51/200 [10:18<31:43, 12.77s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
62
  26%|██▌ | 52/200 [10:30<31:11, 12.64s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
63
  26%|██▋ | 53/200 [10:42<30:30, 12.45s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
64
  27%|██▋ | 54/200 [10:56<31:35, 12.98s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
65
  28%|██▊ | 55/200 [11:10<31:37, 13.09s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
66
  28%|██▊ | 56/200 [11:22<30:41, 12.79s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
67
  28%|██▊ | 57/200 [11:33<29:33, 12.41s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
68
  29%|██▉ | 58/200 [11:46<29:53, 12.63s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
69
  30%|██▉ | 59/200 [11:59<29:27, 12.53s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
70
  30%|███ | 60/200 [12:10<28:30, 12.22s/it]
71
 
72
  30%|███ | 60/200 [12:10<28:30, 12.22s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
73
  30%|███ | 61/200 [12:22<27:57, 12.07s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
74
  31%|███ | 62/200 [12:35<28:26, 12.37s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
75
  32%|███▏ | 63/200 [12:48<28:43, 12.58s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
76
  32%|███▏ | 64/200 [12:59<27:41, 12.21s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
77
  32%|███▎ | 65/200 [13:12<27:59, 12.44s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
78
  33%|███▎ | 66/200 [13:25<27:39, 12.39s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
79
  34%|███▎ | 67/200 [13:37<27:13, 12.28s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
80
  34%|███▍ | 68/200 [13:49<27:13, 12.37s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
81
  34%|███▍ | 69/200 [14:00<26:03, 11.94s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
82
  35%|███▌ | 70/200 [14:11<25:18, 11.68s/it]
83
 
84
  35%|███▌ | 70/200 [14:11<25:18, 11.68s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
85
  36%|███▌ | 71/200 [14:24<25:42, 11.95s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
86
  36%|███▌ | 72/200 [14:35<24:53, 11.67s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
87
  36%|███▋ | 73/200 [14:48<25:25, 12.01s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
88
  37%|███▋ | 74/200 [14:58<24:11, 11.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
89
  38%|███▊ | 75/200 [15:11<24:38, 11.83s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
90
  38%|███▊ | 76/200 [15:22<24:14, 11.73s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
91
  38%|███▊ | 77/200 [15:33<23:19, 11.38s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
92
  39%|███▉ | 78/200 [15:44<23:00, 11.31s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
93
  40%|███▉ | 79/200 [15:57<23:41, 11.75s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
94
  40%|████ | 80/200 [16:08<23:32, 11.77s/it]
95
 
96
  40%|████ | 80/200 [16:09<23:32, 11.77s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
97
  40%|████ | 81/200 [16:20<23:29, 11.84s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
98
  41%|████ | 82/200 [16:32<23:06, 11.75s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
99
  42%|████▏ | 83/200 [16:43<22:32, 11.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
100
  42%|████▏ | 84/200 [16:55<22:24, 11.59s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
101
  42%|████▎ | 85/200 [17:06<22:05, 11.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
102
  43%|████▎ | 86/200 [17:18<22:11, 11.68s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
103
  44%|████▎ | 87/200 [17:30<22:11, 11.79s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
104
  44%|████▍ | 88/200 [17:42<21:53, 11.73s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
105
  44%|████▍ | 89/200 [17:53<21:22, 11.55s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
106
  45%|████▌ | 90/200 [18:04<21:07, 11.52s/it]
107
 
108
  45%|████▌ | 90/200 [18:05<21:07, 11.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
109
  46%|████▌ | 91/200 [18:17<21:21, 11.76s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
110
  46%|████▌ | 92/200 [18:30<21:45, 12.09s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
111
  46%|████▋ | 93/200 [18:41<21:18, 11.95s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
112
  47%|████▋ | 94/200 [18:52<20:46, 11.76s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
113
  48%|████▊ | 95/200 [19:03<20:08, 11.51s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
114
  48%|████▊ | 96/200 [19:15<20:08, 11.62s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
115
  48%|████▊ | 97/200 [19:28<20:36, 12.01s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
116
  49%|████▉ | 98/200 [19:41<20:57, 12.33s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
117
  50%|████▉ | 99/200 [19:54<21:03, 12.51s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
118
  50%|█████ | 100/200 [20:05<20:03, 12.04s/it]
119
 
120
  50%|█████ | 100/200 [20:05<20:03, 12.04s/it]{'loss': '3.099e-07', 'grad_norm': '0.7345', 'learning_rate': '4.775e-06', 'num_tokens': '3.58e+04', 'completions/mean_length': '100', 'completions/min_length': '100', 'completions/max_length': '100', 'completions/clipped_ratio': '1', 'completions/mean_terminated_length': '0', 'completions/min_terminated_length': '0', 'completions/max_terminated_length': '0', 'rewards/compute_reward/mean': '0.01269', 'rewards/compute_reward/std': '0.02462', 'reward': '0.01269', 'reward_std': '0.02462', 'frac_reward_zero_std': '0', 'entropy': '1.357', 'clip_ratio/low_mean': '0', 'clip_ratio/low_min': '0', 'clip_ratio/high_mean': '0', 'clip_ratio/high_max': '0', 'clip_ratio/region_mean': '0', 'step_time': '10.71', 'epoch': '0.01'}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  50%|█████ | 101/200 [20:36<29:22, 17.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
122
  51%|█████ | 102/200 [20:50<26:49, 16.42s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
123
  52%|█████▏ | 103/200 [21:01<24:14, 14.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
124
  52%|█████▏ | 104/200 [21:14<23:08, 14.47s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
125
  52%|█████▎ | 105/200 [21:28<22:16, 14.06s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
126
  53%|█████▎ | 106/200 [21:40<21:29, 13.72s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
127
  54%|█████▎ | 107/200 [21:52<20:09, 13.00s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
128
  54%|█████▍ | 108/200 [22:06<20:24, 13.31s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
129
  55%|█████▍ | 109/200 [22:17<19:09, 12.63s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
130
  55%|█████▌ | 110/200 [22:31<19:44, 13.17s/it]
131
 
132
  55%|█████▌ | 110/200 [22:32<19:44, 13.17s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
133
  56%|█████▌ | 111/200 [22:44<19:22, 13.06s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
134
  56%|█████▌ | 112/200 [22:56<18:50, 12.85s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
135
  56%|█████▋ | 113/200 [23:10<18:49, 12.98s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
136
  57%|█████▋ | 114/200 [23:21<18:04, 12.61s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
137
  57%|█████▊ | 115/200 [23:34<17:45, 12.54s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
138
  58%|█████▊ | 116/200 [23:46<17:21, 12.40s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
139
  58%|█████▊ | 117/200 [23:59<17:25, 12.59s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
140
  59%|█████▉ | 118/200 [24:13<17:38, 12.91s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
141
  60%|█████▉ | 119/200 [24:25<17:23, 12.89s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
142
  60%|██████ | 120/200 [24:38<16:57, 12.71s/it]
143
 
144
  60%|██████ | 120/200 [24:38<16:57, 12.71s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
145
  60%|██████ | 121/200 [24:52<17:12, 13.06s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
146
  61%|██████ | 122/200 [25:04<16:44, 12.88s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
147
  62%|██████▏ | 123/200 [25:14<15:30, 12.08s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
148
  62%|██████▏ | 124/200 [25:27<15:30, 12.25s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
149
  62%|██████▎ | 125/200 [25:40<15:28, 12.38s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
150
  63%|██████▎ | 126/200 [25:52<15:14, 12.36s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
151
  64%|██████▎ | 127/200 [26:03<14:31, 11.94s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
152
  64%|██████▍ | 128/200 [26:17<15:03, 12.55s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
153
  64%|██████▍ | 129/200 [26:29<14:45, 12.47s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
154
  65%|██████▌ | 130/200 [26:42<14:45, 12.64s/it]
155
 
156
  65%|██████▌ | 130/200 [26:42<14:45, 12.64s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
157
  66%|██████▌ | 131/200 [26:54<14:14, 12.38s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
158
  66%|██████▌ | 132/200 [27:05<13:30, 11.92s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
159
  66%|██████▋ | 133/200 [27:16<13:12, 11.83s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
160
  67%|██████▋ | 134/200 [27:28<12:46, 11.61s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
161
  68%|██████▊ | 135/200 [27:39<12:32, 11.58s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
162
  68%|██████▊ | 136/200 [27:51<12:37, 11.83s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
163
  68%|██████▊ | 137/200 [28:04<12:32, 11.95s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
164
  69%|██████▉ | 138/200 [28:18<13:00, 12.59s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
165
  70%|██████▉ | 139/200 [28:31<12:56, 12.73s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
166
  70%|███████ | 140/200 [28:44<12:51, 12.86s/it]
167
 
168
  70%|███████ | 140/200 [28:44<12:51, 12.86s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
169
  70%|███████ | 141/200 [28:56<12:23, 12.60s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
170
  71%|███████ | 142/200 [29:08<11:56, 12.36s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
171
  72%|███████▏ | 143/200 [29:20<11:38, 12.26s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
172
  72%|███████▏ | 144/200 [29:31<11:01, 11.81s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
173
  72%|███████▎ | 145/200 [29:43<10:59, 11.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
174
  73%|███████▎ | 146/200 [29:56<11:09, 12.40s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
175
  74%|███████▎ | 147/200 [30:10<11:22, 12.88s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
176
  74%|███████▍ | 148/200 [30:22<10:51, 12.53s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
177
  74%|███████▍ | 149/200 [30:34<10:23, 12.22s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
178
  75%|███████▌ | 150/200 [30:46<10:13, 12.27s/it]
179
 
180
  75%|███████▌ | 150/200 [30:46<10:13, 12.27s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
181
  76%|███████▌ | 151/200 [30:57<09:49, 12.03s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
182
  76%|███████▌ | 152/200 [31:09<09:33, 11.96s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
183
  76%|███████▋ | 153/200 [31:20<09:04, 11.58s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
184
  77%|███████▋ | 154/200 [31:32<09:02, 11.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
185
  78%|███████▊ | 155/200 [31:44<08:56, 11.91s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
186
  78%|███████▊ | 156/200 [31:55<08:32, 11.65s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
187
  78%|███████▊ | 157/200 [32:08<08:35, 11.98s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
188
  79%|███████▉ | 158/200 [32:20<08:18, 11.87s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
189
  80%|███████▉ | 159/200 [32:33<08:25, 12.32s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
190
  80%|████████ | 160/200 [32:45<08:10, 12.25s/it]
191
 
192
  80%|████████ | 160/200 [32:45<08:10, 12.25s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
193
  80%|████████ | 161/200 [32:56<07:40, 11.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
194
  81%|████████ | 162/200 [33:07<07:19, 11.58s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
195
  82%|████████▏ | 163/200 [33:20<07:25, 12.03s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
196
  82%|████████▏ | 164/200 [33:33<07:24, 12.35s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
197
  82%|████████▎ | 165/200 [33:47<07:25, 12.74s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
198
  83%|████████▎ | 166/200 [33:59<07:10, 12.68s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
199
  84%|████████▎ | 167/200 [34:13<07:02, 12.80s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
200
  84%|████████▍ | 168/200 [34:24<06:36, 12.39s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
201
  84%|████████▍ | 169/200 [34:37<06:28, 12.52s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
202
  85%|████████▌ | 170/200 [34:48<06:01, 12.06s/it]
203
 
204
  85%|████████▌ | 170/200 [34:48<06:01, 12.06s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
205
  86%|████████▌ | 171/200 [34:59<05:41, 11.76s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
206
  86%|████████▌ | 172/200 [35:11<05:29, 11.77s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
207
  86%|████████▋ | 173/200 [35:22<05:14, 11.65s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
208
  87%|████████▋ | 174/200 [35:34<05:03, 11.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
209
  88%|████████▊ | 175/200 [35:45<04:52, 11.71s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
210
  88%|████████▊ | 176/200 [35:59<04:50, 12.11s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
211
  88%|████████▊ | 177/200 [36:12<04:45, 12.42s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
212
  89%|████████▉ | 178/200 [36:24<04:33, 12.43s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
213
  90%|████████▉ | 179/200 [36:37<04:21, 12.45s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
214
  90%|█████████ | 180/200 [36:50<04:12, 12.63s/it]
215
 
216
  90%|█████████ | 180/200 [36:50<04:12, 12.63s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
217
  90%|█████████ | 181/200 [37:01<03:51, 12.19s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
218
  91%|█████████ | 182/200 [37:15<03:47, 12.66s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
219
  92%|█████████▏| 183/200 [37:27<03:33, 12.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
220
  92%|█████████▏| 184/200 [37:39<03:17, 12.34s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
221
  92%|█████████▎| 185/200 [37:51<03:04, 12.29s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
222
  93%|█████████▎| 186/200 [38:02<02:45, 11.82s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
223
  94%|█████████▎| 187/200 [38:13<02:31, 11.62s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
224
  94%|█████████▍| 188/200 [38:24<02:17, 11.49s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
225
  94%|█████████▍| 189/200 [38:35<02:05, 11.40s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
226
  95%|█████████▌| 190/200 [38:47<01:55, 11.53s/it]
227
 
228
  95%|█████████▌| 190/200 [38:47<01:55, 11.53s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
229
  96%|█████████▌| 191/200 [38:58<01:43, 11.49s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
230
  96%|█████████▌| 192/200 [39:10<01:31, 11.39s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
231
  96%|█████████▋| 193/200 [39:23<01:23, 11.99s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
232
  97%|█████████▋| 194/200 [39:34<01:10, 11.79s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
233
  98%|█████████▊| 195/200 [39:46<00:59, 11.85s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
234
  98%|█████████▊| 196/200 [39:57<00:46, 11.55s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
235
  98%|█████████▊| 197/200 [40:08<00:34, 11.34s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
236
  99%|█████████▉| 198/200 [40:20<00:23, 11.56s/it]Both `max_new_tokens` (=100) and `max_length`(=2048) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
 
 
237
 
 
 
 
 
 
 
 
 
 
 
 
 
 
238
 
 
 
 
 
 
 
 
 
 
 
 
 
wandb/debug-cli.rayyan.log ADDED
File without changes
wandb/settings ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ [default]
2
+ mode = disabled
3
+