Huggingface-tutorial总结
【代码】Huggingface的Transformer库经验总结。
文章目录
from transformer import BertModel
参考资料https://blog.csdn.net/meiqi0538/article/details/124891560
在 BERT 模型中,out.last_hidden_state[:, 0, :]
和 pooler_output
都是对应于输入序列的第一个 token(通常是 [CLS]
token)的隐藏状态,但它们并不完全一样。
out.last_hidden_state[:, 0, :]
是最后一层的隐藏状态,直接从模型的输出中取得。
而 pooler_output
是经过一个额外的全连接层和 Tanh 激活函数处理后的输出。这个全连接层的权重是在预训练过程中学习的,用于下游任务(如文本分类)。
在某些情况下,out.last_hidden_state[:, 0, :]
和 pooler_output
可能会有相似的值,但它们并不完全一样。如果你在你的模型中观察到它们完全一样,那可能是因为你的模型还没有经过训练,全连接层的权重还是初始值,或者是因为你的模型在训练过程中发生了一些问题。
evaluate类
由于网络原因,下面的代码无法运行
import evaluate
metrics_list = evaluate.list_evaluation_modules(module_type='metric')
print(metrics_list)
采用镜像库https://hf-mirror.com/的建议,修改内置的HF-ENDPOINT检查点
在evaluate下的config.py文件当中,修改如下
# Hub
# HF_ENDPOINT = os.environ.get("HF_ENDPOINT", "https://huggingface.co")
HF_ENDPOINT = os.environ.get("HF_ENDPOINT", "https://hf-mirror.com")
记得改好之后重新启动一下环境(或kernel)
datasets类
datasets保存和加载数据集基本用法
load_dataset() vs load_from_disk()
load_dataset
和 load_from_disk
是 Hugging Face 的 datasets 库中的两个不同的函数,它们都可以用来加载数据集,但是用法和目的有所不同。
load_dataset
函数用于加载 Hugging Face datasets 库中的预定义数据集,或者加载用户自定义的数据集。当使用本地路径作为参数时,load_dataset
会尝试加载该路径下的数据集脚本(dataset script)和数据文件。数据集脚本是一个 Python 脚本,定义了如何处理和加载数据。在加载过程中会自动从huggingfacce的仓库中下载一份预定义的数据。
dataset = load_dataset(path='path/to/dataset/directory')
load_from_disk
函数用于加载之前使用 Dataset.save_to_disk
方法保存到磁盘的数据集。这个函数不需要数据集脚本,因为数据已经被处理和格式化,可以直接加载。
# 根据load_datasets读取的结果进行保存,注意如果是单独保存split后的train或者其他部分,下面代码中的参数应该改为dataset_path才能保存。因为保存的数据就不是DatasetDict格式了,而是Dataset格式。
dataset.save_to_disk(dataset_dict_path='./data/ChnSentiCorp')
# 加载save_to_disk保存的数据集
dataset = load_from_disk('path/to/dataset/directory')
所以,当你使用 load_dataset
从本地路径加载数据集时,你需要提供数据集脚本和数据文件。而当你使用 load_from_disk
从本地路径加载数据集时,你只需要提供之前保存的数据集的路径。
from datasets import load_dataset
#加载数据
dataset = load_dataset(path='data/lansinuote/ChnSentiCorp',split='train')
dataset
保存数据为其他格式
#本地路径
dataset = load_dataset(path='data/lansinuote/ChnSentiCorp')
#网络路径
dateset = load_dataset(path='lansinuote/ChnSentiCorp')
#默认格式保存到磁盘
dataset.save_to_disk(dataset_dict_path='./data/ChnSentiCorp')
#默认格式从磁盘中读取
dataset = load_from_disk('./data/ChnSentiCorp')
#第三章/导出为csv格式,所有数据存储为一个ChnSentiCorp.csv文件
dataset = load_dataset(path='lansinuote/ChnSentiCorp', split='train')
dataset.to_csv(path_or_buf='./data/ChnSentiCorp.csv')
#加载csv格式数据
csv_dataset = load_dataset(path='csv',
data_files='./data/ChnSentiCorp.csv',
split='train')
csv_dataset[20]
#第三章/导出为json格式,所有数据存储为一个ChnSentiCorp.json文件
dataset = load_dataset(path='lansinuote/ChnSentiCorp', split='train')
dataset.to_json(path_or_buf='./data/ChnSentiCorp.json')
#加载json格式数据
json_dataset = load_dataset(path='json',
data_files='./data/ChnSentiCorp.json',
split='train')
json_dataset[20]
transformers
transformers.Trainer
数据集制备
train.train()需要喂入继承自torch.utils.data.Dataset并结合自己的数据集实现的类。datasets类知识一个数据库,方便用来读取数据,具体进行训练的时候,也需要继承一下。
类通常需要实现能够根据索引index,从datasets中获得返回的tokenizer()分词后的字典,字典中通常包含了input_ids、attention_mask、token_type_ids三个键,分别对应了输入的id、注意力掩码和token的类型。
input_ids:一个包含每个token在词汇表中的ID的列表。这些ID将被模型用于查找每个token的嵌入。
attention_mask和token_type_ids的区别是attention_mask是用于指示哪些是padding的,token_type_ids是用于指示哪些是第一个句子,哪些是第二个句子。对于只有一个句子的数据,通常不需要使用token_type_id。另外字典中可以继续加入’label’等内容。因此返回的字典内容,基础是包括’input_dis’和‘attention_mask’。
{'input_ids': tensor([ 0, 2847, 38, 21, XXXXXX 4, 2, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, XXX]),
'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, XXX 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, XXX]),
'labels': tensor(1.)}
然后在把这个datasets和使用数据的model一起喂入trainer(),然后train.train()就会利用数据和模型进行自动化的训练了。
由此可知,具体继承torch.utils.data.Dataset要返回的内容,可以根据实际情况来定制,如果直接是对原始数据的处理和返回,那么通常model在定义的时候,里面也是按照dataset中的数据内容来进行处理的,包括分词和获得嵌入等操作一般就在model内部进行了。
- 补充:如何利用input_ids查找每个token的嵌入呢?
在transformer模型中,每个token的嵌入是通过一个嵌入层来获取的,这个嵌入层是模型的一部分。你可以通过模型的embeddings属性来访问这个嵌入层。
以下是一个简单的例子,展示了如何使用input_ids来查找每个token的嵌入:
# 假设你已经有了一个模型和一些input_ids
model = ...
input_ids = ...
# 获取嵌入
embeddings = model.embeddings(input_ids)
在这个例子中,model.embeddings(input_ids)会返回一个张量,这个张量的形状是(batch_size, sequence_length, embedding_dim)。这个张量中的每个元素都是一个token的嵌入。
请注意,这个例子假设你的模型有一个embeddings属性,这个属性是一个可以接受input_ids并返回嵌入的函数。不同的模型可能会有不同的接口,所以你可能需要查看你的模型的文档来了解如何获取嵌入。
基于 trainer.train()直接训练模型
from transformers import (
AutoConfig, #从预训练模型中加载模型配置信息
AutoModel,
AutoTokenizer, #从预训练模型中加载模型分词器
EvalPrediction,
HfArgumentParser,
Trainer, AdamW, #引入Trainer
TrainingArguments,
set_seed,
)
trainer = Trainer(
model=model,
args=training_args,#transformer.training_args:训练参数,通常是通过TrainingArguments类生成的实例,包含了训练配置如学习旅\batch size\训练epoch数等。
train_dataset=train_dataset,#训练数据集
eval_dataset=eval_dataset,#评估数据集
compute_metrics=compute_metrics,#用于计算评估指标的函数
)
trainer.train(#传递预训练模型的路径,启动模型的训练过程
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
)
class Trainer:
#这段代码根据训练数据集的类型和硬件环境,选择适当的采样器来处理数据集。
def _get_train_sampler(self) -> Optional[torch.utils.data.sampler.Sampler]:
if isinstance(self.train_dataset, torch.utils.data.IterableDataset):
return None
elif is_torch_tpu_available():
return get_tpu_sampler(self.train_dataset)
else:
return (
RandomSampler(self.train_dataset)
if self.args.local_rank == -1
else DistributedSampler(self.train_dataset)
)
class Trainer:
#这段代码通过检查和设置采样器来创建一个适用于训练数据集的 DataLoader 对象,并返回它,以便在训练过程中使用。
def get_train_dataloader(self) -> DataLoader:
"""
Returns the training :class:`~torch.utils.data.DataLoader`.
Will use no sampler if :obj:`self.train_dataset` is a :obj:`torch.utils.data.IterableDataset`, a random sampler
(adapted to distributed training if necessary) otherwise.
Subclass and override this method if you want to inject some custom behavior.
"""
if self.train_dataset is None:
raise ValueError("Trainer: training requires a train_dataset.")
train_sampler = self._get_train_sampler()
return DataLoader(
self.train_dataset,
batch_size=self.args.train_batch_size,
sampler=train_sampler,
collate_fn=self.data_collator,
drop_last=self.args.dataloader_drop_last,
)
# This might change the seed so needs to run first.
self._hp_search_setup(trial)
# Model re-init
if self.model_init is not None:
# Seed must be set before instantiating the model when using model_init.
set_seed(self.args.seed)
model = self.model_init()
self.model = model.to(self.args.device)
# Reinitializes optimizer and scheduler
self.optimizer, self.lr_scheduler = None, None
# Data loader and number of training steps
train_dataloader = self.get_train_dataloader()
if self.args.max_steps > 0:
t_total = self.args.max_steps
num_train_epochs = (
self.args.max_steps // (len(train_dataloader) // self.args.gradient_accumulation_steps) + 1
)
else:
t_total = int(len(train_dataloader) // self.args.gradient_accumulation_steps * self.args.num_train_epochs)
num_train_epochs = self.args.num_train_epochs
self.args.max_steps = t_total
self.create_optimizer_and_scheduler(num_training_steps=t_total)
# Check if saved optimizer or scheduler states exist
if (
model_path is not None
and os.path.isfile(os.path.join(model_path, "optimizer.pt"))
and os.path.isfile(os.path.join(model_path, "scheduler.pt"))
):
# Load in optimizer and scheduler states
self.optimizer.load_state_dict(
torch.load(os.path.join(model_path, "optimizer.pt"), map_location=self.args.device)
)
self.lr_scheduler.load_state_dict(torch.load(os.path.join(model_path, "scheduler.pt")))
model = self.model
if self.args.fp16 and _use_apex:
if not is_apex_available():
raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
model, self.optimizer = amp.initialize(model, self.optimizer, opt_level=self.args.fp16_opt_level)
# multi-gpu training (should be after apex fp16 initialization)
if self.args.n_gpu > 1:
model = torch.nn.DataParallel(model)
# Distributed training (should be after apex fp16 initialization)
if self.args.local_rank != -1:
model = torch.nn.parallel.DistributedDataParallel(
model,
device_ids=[self.args.local_rank],
output_device=self.args.local_rank,
find_unused_parameters=True,
)
if self.tb_writer is not None:
self.tb_writer.add_text("args", self.args.to_json_string())
self.tb_writer.add_hparams(self.args.to_sanitized_dict(), metric_dict={})
Trainer.train()开始训练
以下内容是源自cpu版本的transformers==3.1.0中的transformers.Trainer类中的train()方法的源码。
Trainer类在用之前,通常会先实例化一个。
#9. 训练循环
# Train!
if is_torch_tpu_available():
total_train_batch_size = self.args.train_batch_size * xm.xrt_world_size()
else:
total_train_batch_size = (
self.args.train_batch_size
* self.args.gradient_accumulation_steps
* (torch.distributed.get_world_size() if self.args.local_rank != -1 else 1)
)
logger.info("***** Running training *****")
logger.info(" Num examples = %d", self.num_examples(train_dataloader))# 训练示例数量
logger.info(" Num Epochs = %d", num_train_epochs)#训练的总 Epoch 数
logger.info(" Instantaneous batch size per device = %d", self.args.per_device_train_batch_size)#每个设备上的批量大小
logger.info(" Total train batch size (w. parallel, distributed & accumulation) = %d", total_train_batch_size)#并行、分布式和梯度累积后的总训练批量大小
logger.info(" Gradient Accumulation steps = %d", self.args.gradient_accumulation_steps)#梯度累积步数
logger.info(" Total optimization steps = %d", t_total)#总优化步骤数
# 这些变量用于跟踪训练过程中的全局步骤、当前 Epoch、已经训练的 Epoch 数和当前 Epoch 中已经训练的步骤数。
self.global_step = 0
self.epoch = 0
epochs_trained = 0
steps_trained_in_current_epoch = 0
# 检查点恢复这部分代码用于从检查点恢复训练进度。如果提供了 model_path,则尝试从中解析出 global_step、epochs_trained 和 steps_trained_in_current_epoch,以便继续训练。如果解析失败,则从头开始训练。
# Check if continuing training from a checkpoint
if model_path is not None:
# set global_step to global_step of last saved checkpoint from model path
try:
self.global_step = int(model_path.split("-")[-1].split(os.path.sep)[0])
epochs_trained = self.global_step // (len(train_dataloader) // self.args.gradient_accumulation_steps)
steps_trained_in_current_epoch = self.global_step % (
len(train_dataloader) // self.args.gradient_accumulation_steps
)
logger.info(" Continuing training from checkpoint, will skip to saved global_step")
logger.info(" Continuing training from epoch %d", epochs_trained)
logger.info(" Continuing training from global step %d", self.global_step)
logger.info(" Will skip the first %d steps in the first epoch", steps_trained_in_current_epoch)
except ValueError:
self.global_step = 0
logger.info(" Starting fine-tuning.")
# 初始化损失和优化器 • tr_loss:初始化训练损失。logging_loss_scalar:用于日志记录的损失标量。model.zero_grad():将模型的梯度归零。
tr_loss = torch.tensor(0.0).to(self.args.device)
logging_loss_scalar = 0.0
model.zero_grad()
# 训练循环
disable_tqdm = self.args.disable_tqdm or not self.is_local_process_zero()# • disable_tqdm:控制进度条的显示。
train_pbar = trange(epochs_trained, int(np.ceil(num_train_epochs)), desc="Epoch", disable=disable_tqdm)#train_pbar:用于显示 Epoch 进度的进度条。
# ange(epochs_trained, int(np.ceil(num_train_epochs))):从已经训练的 Epoch 开始,继续进行剩余的 Epoch 训练。如果使用分布式采样器,设置当前 Epoch。
for epoch in range(epochs_trained, int(np.ceil(num_train_epochs))):
if isinstance(train_dataloader, DataLoader) and isinstance(train_dataloader.sampler, DistributedSampler):
train_dataloader.sampler.set_epoch(epoch)
# 数据加载和迭代根据设备类型(TPU 或 GPU)设置数据加载器。 如果 past_index 大于等于 0,重置过去的内存状态。
if is_torch_tpu_available():
parallel_loader = pl.ParallelLoader(train_dataloader, [self.args.device]).per_device_loader(
self.args.device
)
epoch_iterator = parallel_loader
else:
epoch_iterator = train_dataloader
# Reset the past mems state at the beginning of each epoch if necessary.
if self.args.past_index >= 0:
self._past = None
#
epoch_pbar = tqdm(epoch_iterator, desc="Iteration", disable=disable_tqdm)
# 迭代每个epoch :inputs 通常包含如下键: input_ids:输入文本的标记索引。 attention_mask:关注标记,用于指示模型应关注哪些标记。labels:目标标签。
for step, inputs in enumerate(epoch_iterator):
# Skip past any already trained steps if resuming training
if steps_trained_in_current_epoch > 0:
steps_trained_in_current_epoch -= 1
epoch_pbar.update(1)
continue
tr_loss += self.training_step(model, inputs) #进入将输入喂入模型进行训练的阶段
if (step + 1) % self.args.gradient_accumulation_steps == 0 or (
# last step in epoch but step is always smaller than gradient_accumulation_steps
len(epoch_iterator) <= self.args.gradient_accumulation_steps
and (step + 1) == len(epoch_iterator)
):
if self.args.fp16 and _use_native_amp:
self.scaler.unscale_(self.optimizer)
torch.nn.utils.clip_grad_norm_(model.parameters(), self.args.max_grad_norm)
elif self.args.fp16 and _use_apex:
torch.nn.utils.clip_grad_norm_(amp.master_params(self.optimizer), self.args.max_grad_norm)
else:
torch.nn.utils.clip_grad_norm_(model.parameters(), self.args.max_grad_norm)
if is_torch_tpu_available():
xm.optimizer_step(self.optimizer)
elif self.args.fp16 and _use_native_amp:
self.scaler.step(self.optimizer)
self.scaler.update()
else:
self.optimizer.step()
self.lr_scheduler.step()
model.zero_grad()
self.global_step += 1
self.epoch = epoch + (step + 1) / len(epoch_iterator)
if (self.args.logging_steps > 0 and self.global_step % self.args.logging_steps == 0) or (
self.global_step == 1 and self.args.logging_first_step
):
logs: Dict[str, float] = {}
tr_loss_scalar = tr_loss.item()
logs["loss"] = (tr_loss_scalar - logging_loss_scalar) / self.args.logging_steps
# backward compatibility for pytorch schedulers
logs["learning_rate"] = (
self.lr_scheduler.get_last_lr()[0]
if version.parse(torch.__version__) >= version.parse("1.4")
else self.lr_scheduler.get_lr()[0]
)
logging_loss_scalar = tr_loss_scalar
self.log(logs)
if self.args.evaluate_during_training and self.global_step % self.args.eval_steps == 0:
metrics = self.evaluate()
self._report_to_hp_search(trial, epoch, metrics)
if self.args.save_steps > 0 and self.global_step % self.args.save_steps == 0:
# In all cases (even distributed/parallel), self.model is always a reference
# to the model we want to save.
if hasattr(model, "module"):
assert (
model.module is self.model
), f"Module {model.module} should be a reference to self.model"
else:
assert model is self.model, f"Model {model} should be a reference to self.model"
# Save model checkpoint
checkpoint_folder = f"{PREFIX_CHECKPOINT_DIR}-{self.global_step}"
if self.hp_search_backend is not None and trial is not None:
run_id = (
trial.number
if self.hp_search_backend == HPSearchBackend.OPTUNA
else tune.get_trial_id()
)
checkpoint_folder += f"-run-{run_id}"
output_dir = os.path.join(self.args.output_dir, checkpoint_folder)
self.save_model(output_dir)
if self.is_world_process_zero():
self._rotate_checkpoints()
if is_torch_tpu_available():
xm.rendezvous("saving_optimizer_states")
xm.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
xm.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
elif self.is_world_process_zero():
torch.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
torch.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
epoch_pbar.update(1)
if self.args.max_steps > 0 and self.global_step >= self.args.max_steps:
break
epoch_pbar.close()
train_pbar.update(1)
if self.args.tpu_metrics_debug or self.args.debug:
if is_torch_tpu_available():
# tpu-comment: Logging debug metrics for PyTorch/XLA (compile, execute times, ops, etc.)
xm.master_print(met.metrics_report())
else:
logger.warning(
"You enabled PyTorch/XLA debug metrics but you don't have a TPU "
"configured. Check your training configuration if this is unexpected."
)
if self.args.max_steps > 0 and self.global_step >= self.args.max_steps:
break
开始训练的示意图
tr_loss += self.training_step(model, inputs)->调用DAGN.forward()得到输出结果
def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]]) -> torch.Tensor:
"""
Perform a training step on a batch of inputs.
Subclass and override to inject custom behavior.
Args:
model (:obj:`nn.Module`):
The model to train.
inputs (:obj:`Dict[str, Union[torch.Tensor, Any]]`):
The inputs and targets of the model.
The dictionary will be unpacked before being fed to the model. Most models expect the targets under the
argument :obj:`labels`. Check your model's documentation for all accepted arguments.
Return:
:obj:`torch.Tensor`: The tensor with training loss on this batch.
"""
if hasattr(self, "_training_step"):
warnings.warn(
"The `_training_step` method is deprecated and won't be called in a future version, define `training_step` in your subclass.",
FutureWarning,
)
return self._training_step(model, inputs, self.optimizer)
model.train()
inputs = self._prepare_inputs(inputs)
if self.args.fp16 and _use_native_amp:
with autocast():
outputs = model(**inputs)
loss = outputs[0]
else:
outputs = model(**inputs) #调用DAGN中的forward()方法获得输出
# We don't use .loss here since the model may return tuples instead of ModelOutput.
loss = outputs[0] #
获得输出需要调用DAGN.py中的
class DAGN():
def forward():
# input_ids.size()=torch.Size([32,128])
flat_input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
flat_token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
flat_passage_mask = passage_mask.view(-1, passage_mask.size(-1)) if passage_mask is not None else None
flat_question_mask = question_mask.view(-1, question_mask.size(-1)) if question_mask is not None else None
flat_argument_bpe_ids = argument_bpe_ids.view(-1, argument_bpe_ids.size(-1)) if argument_bpe_ids is not None else None
flat_domain_bpe_ids = domain_bpe_ids.view(-1, domain_bpe_ids.size(-1)) if domain_bpe_ids is not None else None
flat_punct_bpe_ids = punct_bpe_ids.view(-1, punct_bpe_ids.size(-1)) if punct_bpe_ids is not None else None
# 获取roberta模型最后一层的输出利用input和attention_mask;last_hidden_state=torch.Size([32,128,768])# BERT输入的所有token经过BERT编码后,会在每个位置输出一个大小为 hidden_size(在 BERT-base中是 768)的向量。
last_hidden_state, p = self.roberta(flat_input_ids, attention_mask=flat_attention_mask, token_type_ids=None, return_dict = False)
#last_hidden_state, p = self.bert(flat_input_ids, attention_mask=flat_attention_mask, token_type_ids=None, return_dict = False)
sequence_output = last_hidden_state#torch.Size([32,128,768])
pooled_output = p #torch.Size([32.768])
上述文件中指明了forward()过程中使用的roberta模型输出得到的基本内容:
@dataclass
class BaseModelOutputWithPooling(ModelOutput):
"""
Base class for model's outputs that also contains a pooling of the last hidden states.
Args:
last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model.
pooler_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, hidden_size)`):
Last layer hidden-state of the first token of the sequence (classification token)
further processed by a Linear layer and a Tanh activation function. The Linear
layer weights are trained from the next sentence prediction (classification)
objective during pretraining.
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
"""
last_hidden_state: torch.FloatTensor
pooler_output: torch.FloatTensor = None
hidden_states: Optional[Tuple[torch.FloatTensor]] = None
attentions: Optional[Tuple[torch.FloatTensor]] = None
图像解释:
dagn的forward()中逐步计算损失函数
在dagn.py的第464行,在计算损失的时候,github中的原始代码调用错了算是计算的函数,改为
loss2 = self.get_con_lossL(positive_keys2, negative_keys2)
trainer中的早停策略和评估策略
早停(Early Stopping)通常需要和评估策略(evaluation_strategy)一起设置,以确保模型在每个评估周期后检查其性能并决定是否停止训练。早停机制依赖于评估结果,因此必须定期进行评估才能触发早停逻辑。
# 早停策略和聘雇策略需要结合使用,不然训练过程中不知道在什么时候停止训练
early_stopping_patience: int = field(#早停策略
default=1,
metadata={"help": "Number of evaluations with no improvement after which training will be stopped."}
)
evaluation_strategy: str = field( #评估策略,每个epoch结束后运行,在测试集数据上评估模型
default="epoch", #默认是none,训练过程中不评估
metadata={"help": "The evaluation strategy to adopt during training."}
)
metric_for_best_model: str = field(# 早停策略所用的指标
default="pearson",
metadata={"help": "The metric to use to compare models."}
)
greater_is_better: bool = field(# 早停策略所用的指标# 指标越大越好(适用于 Pearson 相关系数)
default=True,
metadata={"help": "Whether the better metric is greater or not."}
)
参考资料
《huggingface中文教程》https://github.com/lansinuote/Huggingface_Toturials
更多推荐
所有评论(0)