VoyagerXHF commited on
Commit
95a723d
·
verified ·
1 Parent(s): b2c1da7

fix: rename response variable to chat_response in README code example

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -1,10 +1,13 @@
1
  ---
 
 
 
 
2
  library_name: transformers
3
  license: apache-2.0
4
  license_link: https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8/blob/main/LICENSE
5
  pipeline_tag: image-text-to-text
6
- base_model:
7
- - Qwen/Qwen3.6-35B-A3B
8
  ---
9
 
10
  # Qwen3.6-35B-A3B-FP8
@@ -666,8 +669,7 @@ export OPENAI_API_KEY="EMPTY"
666
  > We recommend using the following set of sampling parameters for generation
667
  > - Thinking mode for general tasks: `temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0`
668
  > - Thinking mode for precise coding tasks (e.g. WebDev): `temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0`
669
- > - Instruct (or non-thinking) mode for general tasks: `temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0`
670
- > - Instruct (or non-thinking) mode for reasoning tasks: `temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0`
671
  >
672
  > Please note that the support for sampling parameters varies according to inference frameworks.
673
 
@@ -727,7 +729,7 @@ messages = [
727
  }
728
  ]
729
 
730
- response = client.chat.completions.create(
731
  model="Qwen/Qwen3.6-35B-A3B-FP8",
732
  messages=messages,
733
  max_tokens=81920,
@@ -772,7 +774,7 @@ messages = [
772
  #
773
  # By default, `fps=2` and `do_sample_frames=True`.
774
  # With `do_sample_frames=True`, you can customize the `fps` value to set your desired video sampling rate.
775
- response = client.chat.completions.create(
776
  model="Qwen/Qwen3.6-35B-A3B-FP8",
777
  messages=messages,
778
  max_tokens=81920,
@@ -1011,10 +1013,8 @@ To achieve optimal performance, we recommend the following settings:
1011
  `temperature=1.0`, `top_p=0.95`, `top_k=20`, `min_p=0.0`, `presence_penalty=1.5`, `repetition_penalty=1.0`
1012
  - **Thinking mode for precise coding tasks (e.g., WebDev)**:
1013
  `temperature=0.6`, `top_p=0.95`, `top_k=20`, `min_p=0.0`, `presence_penalty=0.0`, `repetition_penalty=1.0`
1014
- - **Instruct (or non-thinking) mode for general tasks**:
1015
- `temperature=0.7`, `top_p=0.8`, `top_k=20`, `min_p=0.0`, `presence_penalty=1.5`, `repetition_penalty=1.0`
1016
- - **Instruct (or non-thinking) mode for reasoning tasks**:
1017
- `temperature=1.0`, `top_p=1.0`, `top_k=40`, `min_p=0.0`, `presence_penalty=2.0`, `repetition_penalty=1.0`
1018
  - For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
1019
 
1020
  2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 81,920 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.
@@ -1043,4 +1043,4 @@ If you find our work helpful, feel free to give us a cite.
1043
  month = {April},
1044
  year = {2026}
1045
  }
1046
- ```
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen3.6-35B-A3B
4
+ frameworks:
5
+ - ""
6
  library_name: transformers
7
  license: apache-2.0
8
  license_link: https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8/blob/main/LICENSE
9
  pipeline_tag: image-text-to-text
10
+ tasks: []
 
11
  ---
12
 
13
  # Qwen3.6-35B-A3B-FP8
 
669
  > We recommend using the following set of sampling parameters for generation
670
  > - Thinking mode for general tasks: `temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0`
671
  > - Thinking mode for precise coding tasks (e.g. WebDev): `temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0`
672
+ > - Instruct (or non-thinking) mode: `temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0`
 
673
  >
674
  > Please note that the support for sampling parameters varies according to inference frameworks.
675
 
 
729
  }
730
  ]
731
 
732
+ chat_response = client.chat.completions.create(
733
  model="Qwen/Qwen3.6-35B-A3B-FP8",
734
  messages=messages,
735
  max_tokens=81920,
 
774
  #
775
  # By default, `fps=2` and `do_sample_frames=True`.
776
  # With `do_sample_frames=True`, you can customize the `fps` value to set your desired video sampling rate.
777
+ chat_response = client.chat.completions.create(
778
  model="Qwen/Qwen3.6-35B-A3B-FP8",
779
  messages=messages,
780
  max_tokens=81920,
 
1013
  `temperature=1.0`, `top_p=0.95`, `top_k=20`, `min_p=0.0`, `presence_penalty=1.5`, `repetition_penalty=1.0`
1014
  - **Thinking mode for precise coding tasks (e.g., WebDev)**:
1015
  `temperature=0.6`, `top_p=0.95`, `top_k=20`, `min_p=0.0`, `presence_penalty=0.0`, `repetition_penalty=1.0`
1016
+ - **Instruct (or non-thinking) mode**:
1017
+ `temperature=0.7`, `top_p=0.80`, `top_k=20`, `min_p=0.0`, `presence_penalty=1.5`, `repetition_penalty=1.0`
 
 
1018
  - For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
1019
 
1020
  2. **Adequate Output Length**: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 81,920 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.
 
1043
  month = {April},
1044
  year = {2026}
1045
  }
1046
+ ```