-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Open
Labels
buggood first issuehelp wantedHelp from the OSS community wanted!Help from the OSS community wanted!
Description
- Did you update?
pip install --upgrade unsloth unsloth_zoo-> Yes ColaborKaggleor local / cloud -> local- Number GPUs used, use
nvidia-smi-> 1 - Which notebook? Please link! -> Llama Factory
- Which Unsloth version, TRL version, transformers version, PyTorch version? -> Unsloth 2026.2.1: Fast Llama patching. Transformers: 4.57.6, Torch: 2.10.0+cu130. CUDA: 8.6. CUDA Toolkit: 13.0. Triton: 3.6.0
- Which trainer?
SFTTrainer,GRPOTraineretc -> SFT (Llama Factory)
[INFO|configuration_utils.py:765] 2026-03-03 22:08:38,762 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--unsloth--Hermes-3-Llama-3.1-8B\snapshots\d90c07b930d73927ba6798f76bf611d857234229\config.json
[INFO|configuration_utils.py:839] 2026-03-03 22:08:38,762 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128040,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 128256
}
[INFO|2026-03-03 22:08:39] llamafactory.data.template:144 >> Add <|im_start|> to stop words.
[WARNING|2026-03-03 22:08:39] llamafactory.data.template:149 >> New tokens have been added, make sure `resize_vocab` is True.
[INFO|configuration_utils.py:765] 2026-03-03 22:08:39,716 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--unsloth--Hermes-3-Llama-3.1-8B\snapshots\d90c07b930d73927ba6798f76bf611d857234229\config.json
[INFO|configuration_utils.py:839] 2026-03-03 22:08:39,716 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128040,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 128256
}
[INFO|2026-03-03 22:08:39] llamafactory.model.model_utils.kv_cache:144 >> KV cache is enabled for faster generation.
[INFO|configuration_utils.py:765] 2026-03-03 22:08:40,280 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--unsloth--Hermes-3-Llama-3.1-8B\snapshots\d90c07b930d73927ba6798f76bf611d857234229\config.json
[INFO|configuration_utils.py:839] 2026-03-03 22:08:40,280 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128040,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 128256
}
Unsloth: WARNING `trust_remote_code` is True.
Are you certain you want to do remote code execution?
==((====))== Unsloth 2026.2.1: Fast Llama patching. Transformers: 4.57.6.
\\ /| NVIDIA GeForce RTX 3090. Num GPUs = 1. Max memory: 24.0 GB. Platform: Windows.
O^O/ \_/ \ Torch: 2.10.0+cu130. CUDA: 8.6. CUDA Toolkit: 13.0. Triton: 3.6.0
\ / Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
[INFO|configuration_utils.py:765] 2026-03-03 22:08:45,334 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--unsloth--Hermes-3-Llama-3.1-8B\snapshots\d90c07b930d73927ba6798f76bf611d857234229\config.json
[INFO|configuration_utils.py:839] 2026-03-03 22:08:45,334 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128040,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 128256
}
[INFO|configuration_utils.py:765] 2026-03-03 22:08:45,484 >> loading configuration file config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--unsloth--Hermes-3-Llama-3.1-8B\snapshots\d90c07b930d73927ba6798f76bf611d857234229\config.json
[INFO|configuration_utils.py:839] 2026-03-03 22:08:45,500 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"dtype": "bfloat16",
"eos_token_id": 128040,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 131072,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 8.0,
"high_freq_factor": 4.0,
"low_freq_factor": 1.0,
"original_max_position_embeddings": 8192,
"rope_type": "llama3"
},
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"transformers_version": "4.57.6",
"use_cache": true,
"vocab_size": 128256
}
[INFO|modeling_utils.py:1172] 2026-03-03 22:08:45,501 >> loading weights file model.safetensors from cache at C:\Users\Mykee\.cache\huggingface\hub\models--unsloth--Hermes-3-Llama-3.1-8B\snapshots\d90c07b930d73927ba6798f76bf611d857234229\model.safetensors.index.json
[INFO|modeling_utils.py:2341] 2026-03-03 22:08:45,501 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:986] 2026-03-03 22:08:45,501 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128040
}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:06<00:00, 1.75s/it]
[INFO|configuration_utils.py:941] 2026-03-03 22:08:53,074 >> loading configuration file generation_config.json from cache at C:\Users\Mykee\.cache\huggingface\hub\models--unsloth--Hermes-3-Llama-3.1-8B\snapshots\d90c07b930d73927ba6798f76bf611d857234229\generation_config.json
[INFO|configuration_utils.py:986] 2026-03-03 22:08:53,074 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": 128040,
"temperature": 0.6,
"top_p": 0.9
}
[INFO|dynamic_module_utils.py:423] 2026-03-03 22:08:53,224 >> Could not locate the custom_generate/generate.py inside unsloth/Hermes-3-Llama-3.1-8B.
Traceback (most recent call last):
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\queueing.py", line 849, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\route_utils.py", line 354, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\blocks.py", line 2191, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\blocks.py", line 1710, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\utils.py", line 760, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\utils.py", line 751, in __anext__
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\anyio\to_thread.py", line 63, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 2502, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 986, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\utils.py", line 734, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\gradio\utils.py", line 898, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\webui\chatter.py", line 158, in load_model
super().__init__(args)
File "E:\LlamaFactory\src\llamafactory\chat\chat_model.py", line 53, in __init__
self.engine: BaseEngine = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\chat\hf_engine.py", line 59, in __init__
self.model = load_model(
^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\model\loader.py", line 189, in load_model
model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\model\adapter.py", line 360, in init_adapter
model = _setup_lora_tuning(
^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\model\adapter.py", line 208, in _setup_lora_tuning
model = load_unsloth_peft_model(config, model_args, finetuning_args, is_trainable=is_trainable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\src\llamafactory\model\model_utils\unsloth.py", line 96, in load_unsloth_peft_model
model, _ = FastLanguageModel.from_pretrained(**unsloth_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\unsloth\models\loader.py", line 602, in from_pretrained
model, tokenizer = dispatch_model.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\unsloth\models\llama.py", line 2490, in from_pretrained
tokenizer = load_correct_tokenizer(
^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\unsloth\tokenizer_utils.py", line 622, in load_correct_tokenizer
chat_template = fix_chat_template(tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\LlamaFactory\venv\Lib\site-packages\unsloth\tokenizer_utils.py", line 734, in fix_chat_template
raise RuntimeError(
RuntimeError: Unsloth: The tokenizer `saves\Llama-3.1-8B-Instruct\lora\train_2026-03-03-20-56-39-Hermes-v1`
does not have a {% if add_generation_prompt %} for generation purposes.
Please file a bug report to the maintainers of `saves\Llama-3.1-8B-Instruct\lora\train_2026-03-03-20-56-39-Hermes-v1` - thanks!
I can't load Lora into the test chat with unsloth optimizer for the unsloth Hermes 3 8B model. I trained it with a ChatML template.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
buggood first issuehelp wantedHelp from the OSS community wanted!Help from the OSS community wanted!