Huggingface-tutorial总结

【代码】Huggingface的Transformer库经验总结。

鱼与钰遇雨

351人浏览 · 2024-06-08 11:23:55

鱼与钰遇雨 · 2024-06-08 11:23:55 发布

文章目录

from transformer import BertModel
evaluate类
datasets类
- datasets保存和加载数据集基本用法
- - load_dataset() vs load_from_disk()
  - 保存数据为其他格式
transformers
- transformers.Trainer
参考资料

from transformer import BertModel

参考资料https://blog.csdn.net/meiqi0538/article/details/124891560
在 BERT 模型中，out.last_hidden_state[:, 0, :] 和 pooler_output 都是对应于输入序列的第一个 token（通常是 [CLS] token）的隐藏状态，但它们并不完全一样。

out.last_hidden_state[:, 0, :] 是最后一层的隐藏状态，直接从模型的输出中取得。

而 pooler_output 是经过一个额外的全连接层和 Tanh 激活函数处理后的输出。这个全连接层的权重是在预训练过程中学习的，用于下游任务（如文本分类）。

在某些情况下，out.last_hidden_state[:, 0, :] 和 pooler_output 可能会有相似的值，但它们并不完全一样。如果你在你的模型中观察到它们完全一样，那可能是因为你的模型还没有经过训练，全连接层的权重还是初始值，或者是因为你的模型在训练过程中发生了一些问题。

evaluate类

由于网络原因，下面的代码无法运行

import evaluate
metrics_list = evaluate.list_evaluation_modules(module_type='metric')
print(metrics_list)

采用镜像库https://hf-mirror.com/的建议，修改内置的HF-ENDPOINT检查点
在evaluate下的config.py文件当中，修改如下

# Hub
# HF_ENDPOINT = os.environ.get("HF_ENDPOINT", "https://huggingface.co")
HF_ENDPOINT = os.environ.get("HF_ENDPOINT", "https://hf-mirror.com")

记得改好之后重新启动一下环境(或kernel)

datasets类

datasets保存和加载数据集基本用法

load_dataset() vs load_from_disk()

load_dataset 和 load_from_disk 是 Hugging Face 的 datasets 库中的两个不同的函数，它们都可以用来加载数据集，但是用法和目的有所不同。
load_dataset 函数用于加载 Hugging Face datasets 库中的预定义数据集，或者加载用户自定义的数据集。当使用本地路径作为参数时，load_dataset 会尝试加载该路径下的数据集脚本（dataset script）和数据文件。数据集脚本是一个 Python 脚本，定义了如何处理和加载数据。在加载过程中会自动从huggingfacce的仓库中下载一份预定义的数据。

dataset = load_dataset(path='path/to/dataset/directory')

load_from_disk 函数用于加载之前使用 Dataset.save_to_disk 方法保存到磁盘的数据集。这个函数不需要数据集脚本，因为数据已经被处理和格式化，可以直接加载。

# 根据load_datasets读取的结果进行保存，注意如果是单独保存split后的train或者其他部分，下面代码中的参数应该改为dataset_path才能保存。因为保存的数据就不是DatasetDict格式了，而是Dataset格式。
dataset.save_to_disk(dataset_dict_path='./data/ChnSentiCorp')
# 加载save_to_disk保存的数据集
dataset = load_from_disk('path/to/dataset/directory')

所以，当你使用 load_dataset 从本地路径加载数据集时，你需要提供数据集脚本和数据文件。而当你使用 load_from_disk 从本地路径加载数据集时，你只需要提供之前保存的数据集的路径。

from datasets import load_dataset
#加载数据
dataset = load_dataset(path='data/lansinuote/ChnSentiCorp',split='train')
dataset

保存数据为其他格式

#本地路径
dataset = load_dataset(path='data/lansinuote/ChnSentiCorp')
#网络路径
dateset = load_dataset(path='lansinuote/ChnSentiCorp')
#默认格式保存到磁盘
dataset.save_to_disk(dataset_dict_path='./data/ChnSentiCorp')
#默认格式从磁盘中读取
dataset = load_from_disk('./data/ChnSentiCorp')

#第三章/导出为csv格式，所有数据存储为一个ChnSentiCorp.csv文件
dataset = load_dataset(path='lansinuote/ChnSentiCorp', split='train')
dataset.to_csv(path_or_buf='./data/ChnSentiCorp.csv')

#加载csv格式数据
csv_dataset = load_dataset(path='csv',
                           data_files='./data/ChnSentiCorp.csv',
                           split='train')
csv_dataset[20]

#第三章/导出为json格式，所有数据存储为一个ChnSentiCorp.json文件
dataset = load_dataset(path='lansinuote/ChnSentiCorp', split='train')
dataset.to_json(path_or_buf='./data/ChnSentiCorp.json')

#加载json格式数据
json_dataset = load_dataset(path='json',
                            data_files='./data/ChnSentiCorp.json',
                            split='train')
json_dataset[20]

transformers

transformers.Trainer

数据集制备

train.train()需要喂入继承自torch.utils.data.Dataset并结合自己的数据集实现的类。datasets类知识一个数据库，方便用来读取数据，具体进行训练的时候，也需要继承一下。

类通常需要实现能够根据索引index，从datasets中获得返回的tokenizer()分词后的字典，字典中通常包含了input_ids、attention_mask、token_type_ids三个键，分别对应了输入的id、注意力掩码和token的类型。
input_ids：一个包含每个token在词汇表中的ID的列表。这些ID将被模型用于查找每个token的嵌入。
attention_mask和token_type_ids的区别是attention_mask是用于指示哪些是padding的，token_type_ids是用于指示哪些是第一个句子，哪些是第二个句子。对于只有一个句子的数据，通常不需要使用token_type_id。另外字典中可以继续加入’label’等内容。因此返回的字典内容，基础是包括’input_dis’和‘attention_mask’。

{'input_ids': tensor([    0,  2847,    38,    21, XXXXXX    4,     2,     1,     1,
            1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
            1,     1,     XXX]), 
            
            'attention_mask': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, XXX 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, XXX]), 
        
        'labels': tensor(1.)}

然后在把这个datasets和使用数据的model一起喂入trainer()，然后train.train()就会利用数据和模型进行自动化的训练了。
由此可知，具体继承torch.utils.data.Dataset要返回的内容，可以根据实际情况来定制，如果直接是对原始数据的处理和返回，那么通常model在定义的时候，里面也是按照dataset中的数据内容来进行处理的，包括分词和获得嵌入等操作一般就在model内部进行了。

补充：如何利用input_ids查找每个token的嵌入呢？

在transformer模型中，每个token的嵌入是通过一个嵌入层来获取的，这个嵌入层是模型的一部分。你可以通过模型的embeddings属性来访问这个嵌入层。

以下是一个简单的例子，展示了如何使用input_ids来查找每个token的嵌入：

# 假设你已经有了一个模型和一些input_ids
model = ...
input_ids = ...

# 获取嵌入
embeddings = model.embeddings(input_ids)

在这个例子中，model.embeddings(input_ids)会返回一个张量，这个张量的形状是(batch_size, sequence_length, embedding_dim)。这个张量中的每个元素都是一个token的嵌入。

请注意，这个例子假设你的模型有一个embeddings属性，这个属性是一个可以接受input_ids并返回嵌入的函数。不同的模型可能会有不同的接口，所以你可能需要查看你的模型的文档来了解如何获取嵌入。

基于 trainer.train()直接训练模型

from transformers import (
    AutoConfig, #从预训练模型中加载模型配置信息
    AutoModel,
    AutoTokenizer, #从预训练模型中加载模型分词器
    EvalPrediction,
    HfArgumentParser,
    Trainer, AdamW, #引入Trainer
    TrainingArguments,
    set_seed,
)

trainer = Trainer(
    model=model,
    args=training_args,#transformer.training_args：训练参数，通常是通过TrainingArguments类生成的实例，包含了训练配置如学习旅\batch size\训练epoch数等。
    train_dataset=train_dataset,#训练数据集
    eval_dataset=eval_dataset,#评估数据集
    compute_metrics=compute_metrics,#用于计算评估指标的函数
)

trainer.train(#传递预训练模型的路径，启动模型的训练过程
    model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
)

class Trainer:
	#这段代码根据训练数据集的类型和硬件环境，选择适当的采样器来处理数据集。
    def _get_train_sampler(self) -> Optional[torch.utils.data.sampler.Sampler]:
        if isinstance(self.train_dataset, torch.utils.data.IterableDataset):
            return None
        elif is_torch_tpu_available():
            return get_tpu_sampler(self.train_dataset)
        else:
            return (
                RandomSampler(self.train_dataset)
                if self.args.local_rank == -1
                else DistributedSampler(self.train_dataset)
            )

class Trainer:
	#这段代码通过检查和设置采样器来创建一个适用于训练数据集的 DataLoader 对象，并返回它，以便在训练过程中使用。
    def get_train_dataloader(self) -> DataLoader:
        """
        Returns the training :class:`~torch.utils.data.DataLoader`.

        Will use no sampler if :obj:`self.train_dataset` is a :obj:`torch.utils.data.IterableDataset`, a random sampler
        (adapted to distributed training if necessary) otherwise.

        Subclass and override this method if you want to inject some custom behavior.
        """
        if self.train_dataset is None:
            raise ValueError("Trainer: training requires a train_dataset.")
        train_sampler = self._get_train_sampler()

        return DataLoader(
            self.train_dataset,
            batch_size=self.args.train_batch_size,
            sampler=train_sampler,
            collate_fn=self.data_collator,
            drop_last=self.args.dataloader_drop_last,
        )

        # This might change the seed so needs to run first.
        self._hp_search_setup(trial)

        # Model re-init
        if self.model_init is not None:
            # Seed must be set before instantiating the model when using model_init.
            set_seed(self.args.seed)
            model = self.model_init()
            self.model = model.to(self.args.device)

            # Reinitializes optimizer and scheduler
            self.optimizer, self.lr_scheduler = None, None

        # Data loader and number of training steps
        train_dataloader = self.get_train_dataloader()
        if self.args.max_steps > 0:
            t_total = self.args.max_steps
            num_train_epochs = (
                self.args.max_steps // (len(train_dataloader) // self.args.gradient_accumulation_steps) + 1
            )
        else:
            t_total = int(len(train_dataloader) // self.args.gradient_accumulation_steps * self.args.num_train_epochs)
            num_train_epochs = self.args.num_train_epochs
            self.args.max_steps = t_total

        self.create_optimizer_and_scheduler(num_training_steps=t_total)

        # Check if saved optimizer or scheduler states exist
        if (
            model_path is not None
            and os.path.isfile(os.path.join(model_path, "optimizer.pt"))
            and os.path.isfile(os.path.join(model_path, "scheduler.pt"))
        ):
            # Load in optimizer and scheduler states
            self.optimizer.load_state_dict(
                torch.load(os.path.join(model_path, "optimizer.pt"), map_location=self.args.device)
            )
            self.lr_scheduler.load_state_dict(torch.load(os.path.join(model_path, "scheduler.pt")))

        model = self.model
        if self.args.fp16 and _use_apex:
            if not is_apex_available():
                raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
            model, self.optimizer = amp.initialize(model, self.optimizer, opt_level=self.args.fp16_opt_level)

        # multi-gpu training (should be after apex fp16 initialization)
        if self.args.n_gpu > 1:
            model = torch.nn.DataParallel(model)

        # Distributed training (should be after apex fp16 initialization)
        if self.args.local_rank != -1:
            model = torch.nn.parallel.DistributedDataParallel(
                model,
                device_ids=[self.args.local_rank],
                output_device=self.args.local_rank,
                find_unused_parameters=True,
            )

        if self.tb_writer is not None:
            self.tb_writer.add_text("args", self.args.to_json_string())
            self.tb_writer.add_hparams(self.args.to_sanitized_dict(), metric_dict={})

Trainer.train()开始训练

以下内容是源自cpu版本的transformers==3.1.0中的transformers.Trainer类中的train()方法的源码。
Trainer类在用之前，通常会先实例化一个。

		#9. 训练循环
		# Train!
        if is_torch_tpu_available():
            total_train_batch_size = self.args.train_batch_size * xm.xrt_world_size()
        else:
            total_train_batch_size = (
                self.args.train_batch_size
                * self.args.gradient_accumulation_steps
                * (torch.distributed.get_world_size() if self.args.local_rank != -1 else 1)
            )
        logger.info("***** Running training *****")
        logger.info("  Num examples = %d", self.num_examples(train_dataloader))# 训练示例数量
        logger.info("  Num Epochs = %d", num_train_epochs)#训练的总 Epoch 数
        logger.info("  Instantaneous batch size per device = %d", self.args.per_device_train_batch_size)#每个设备上的批量大小
        logger.info("  Total train batch size (w. parallel, distributed & accumulation) = %d", total_train_batch_size)#并行、分布式和梯度累积后的总训练批量大小
        logger.info("  Gradient Accumulation steps = %d", self.args.gradient_accumulation_steps)#梯度累积步数
        logger.info("  Total optimization steps = %d", t_total)#总优化步骤数
		# 这些变量用于跟踪训练过程中的全局步骤、当前 Epoch、已经训练的 Epoch 数和当前 Epoch 中已经训练的步骤数。
        self.global_step = 0
        self.epoch = 0
        epochs_trained = 0
        steps_trained_in_current_epoch = 0
        # 检查点恢复这部分代码用于从检查点恢复训练进度。如果提供了 model_path，则尝试从中解析出 global_step、epochs_trained 和 steps_trained_in_current_epoch，以便继续训练。如果解析失败，则从头开始训练。
        # Check if continuing training from a checkpoint
        if model_path is not None:
            # set global_step to global_step of last saved checkpoint from model path
            try:
                self.global_step = int(model_path.split("-")[-1].split(os.path.sep)[0])
                epochs_trained = self.global_step // (len(train_dataloader) // self.args.gradient_accumulation_steps)
                steps_trained_in_current_epoch = self.global_step % (
                    len(train_dataloader) // self.args.gradient_accumulation_steps
                )

                logger.info("  Continuing training from checkpoint, will skip to saved global_step")
                logger.info("  Continuing training from epoch %d", epochs_trained)
                logger.info("  Continuing training from global step %d", self.global_step)
                logger.info("  Will skip the first %d steps in the first epoch", steps_trained_in_current_epoch)
            except ValueError:
                self.global_step = 0
                logger.info("  Starting fine-tuning.")
		# 初始化损失和优化器 	•	tr_loss：初始化训练损失。logging_loss_scalar：用于日志记录的损失标量。model.zero_grad()：将模型的梯度归零。
        tr_loss = torch.tensor(0.0).to(self.args.device)
        logging_loss_scalar = 0.0
        model.zero_grad()
        
        # 训练循环
        disable_tqdm = self.args.disable_tqdm or not self.is_local_process_zero()#	•	disable_tqdm：控制进度条的显示。
        train_pbar = trange(epochs_trained, int(np.ceil(num_train_epochs)), desc="Epoch", disable=disable_tqdm)#train_pbar：用于显示 Epoch 进度的进度条。
        # ange(epochs_trained, int(np.ceil(num_train_epochs)))：从已经训练的 Epoch 开始，继续进行剩余的 Epoch 训练。如果使用分布式采样器，设置当前 Epoch。
        for epoch in range(epochs_trained, int(np.ceil(num_train_epochs))):
            if isinstance(train_dataloader, DataLoader) and isinstance(train_dataloader.sampler, DistributedSampler):
                train_dataloader.sampler.set_epoch(epoch)
			# 数据加载和迭代根据设备类型（TPU 或 GPU）设置数据加载器。	如果 past_index 大于等于 0，重置过去的内存状态。
            if is_torch_tpu_available():
                parallel_loader = pl.ParallelLoader(train_dataloader, [self.args.device]).per_device_loader(
                    self.args.device
                )
                epoch_iterator = parallel_loader
            else:
                epoch_iterator = train_dataloader

            # Reset the past mems state at the beginning of each epoch if necessary.
            if self.args.past_index >= 0:
                self._past = None
			# 
            epoch_pbar = tqdm(epoch_iterator, desc="Iteration", disable=disable_tqdm)
            # 迭代每个epoch ：inputs 通常包含如下键：	input_ids：输入文本的标记索引。	attention_mask：关注标记，用于指示模型应关注哪些标记。labels：目标标签。
            for step, inputs in enumerate(epoch_iterator):

                # Skip past any already trained steps if resuming training
                if steps_trained_in_current_epoch > 0:
                    steps_trained_in_current_epoch -= 1
                    epoch_pbar.update(1)
                    continue

                tr_loss += self.training_step(model, inputs) #进入将输入喂入模型进行训练的阶段

                if (step + 1) % self.args.gradient_accumulation_steps == 0 or (
                    # last step in epoch but step is always smaller than gradient_accumulation_steps
                    len(epoch_iterator) <= self.args.gradient_accumulation_steps
                    and (step + 1) == len(epoch_iterator)
                ):
                    if self.args.fp16 and _use_native_amp:
                        self.scaler.unscale_(self.optimizer)
                        torch.nn.utils.clip_grad_norm_(model.parameters(), self.args.max_grad_norm)
                    elif self.args.fp16 and _use_apex:
                        torch.nn.utils.clip_grad_norm_(amp.master_params(self.optimizer), self.args.max_grad_norm)
                    else:
                        torch.nn.utils.clip_grad_norm_(model.parameters(), self.args.max_grad_norm)

                    if is_torch_tpu_available():
                        xm.optimizer_step(self.optimizer)
                    elif self.args.fp16 and _use_native_amp:
                        self.scaler.step(self.optimizer)
                        self.scaler.update()
                    else:
                        self.optimizer.step()

                    self.lr_scheduler.step()
                    model.zero_grad()
                    self.global_step += 1
                    self.epoch = epoch + (step + 1) / len(epoch_iterator)

                    if (self.args.logging_steps > 0 and self.global_step % self.args.logging_steps == 0) or (
                        self.global_step == 1 and self.args.logging_first_step
                    ):
                        logs: Dict[str, float] = {}
                        tr_loss_scalar = tr_loss.item()
                        logs["loss"] = (tr_loss_scalar - logging_loss_scalar) / self.args.logging_steps
                        # backward compatibility for pytorch schedulers
                        logs["learning_rate"] = (
                            self.lr_scheduler.get_last_lr()[0]
                            if version.parse(torch.__version__) >= version.parse("1.4")
                            else self.lr_scheduler.get_lr()[0]
                        )
                        logging_loss_scalar = tr_loss_scalar

                        self.log(logs)

                    if self.args.evaluate_during_training and self.global_step % self.args.eval_steps == 0:
                        metrics = self.evaluate()
                        self._report_to_hp_search(trial, epoch, metrics)

                    if self.args.save_steps > 0 and self.global_step % self.args.save_steps == 0:
                        # In all cases (even distributed/parallel), self.model is always a reference
                        # to the model we want to save.
                        if hasattr(model, "module"):
                            assert (
                                model.module is self.model
                            ), f"Module {model.module} should be a reference to self.model"
                        else:
                            assert model is self.model, f"Model {model} should be a reference to self.model"
                        # Save model checkpoint
                        checkpoint_folder = f"{PREFIX_CHECKPOINT_DIR}-{self.global_step}"
                        if self.hp_search_backend is not None and trial is not None:
                            run_id = (
                                trial.number
                                if self.hp_search_backend == HPSearchBackend.OPTUNA
                                else tune.get_trial_id()
                            )
                            checkpoint_folder += f"-run-{run_id}"
                        output_dir = os.path.join(self.args.output_dir, checkpoint_folder)

                        self.save_model(output_dir)

                        if self.is_world_process_zero():
                            self._rotate_checkpoints()

                        if is_torch_tpu_available():
                            xm.rendezvous("saving_optimizer_states")
                            xm.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
                            xm.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
                        elif self.is_world_process_zero():
                            torch.save(self.optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
                            torch.save(self.lr_scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))

                epoch_pbar.update(1)
                if self.args.max_steps > 0 and self.global_step >= self.args.max_steps:
                    break
            epoch_pbar.close()
            train_pbar.update(1)
            if self.args.tpu_metrics_debug or self.args.debug:
                if is_torch_tpu_available():
                    # tpu-comment: Logging debug metrics for PyTorch/XLA (compile, execute times, ops, etc.)
                    xm.master_print(met.metrics_report())
                else:
                    logger.warning(
                        "You enabled PyTorch/XLA debug metrics but you don't have a TPU "
                        "configured. Check your training configuration if this is unexpected."
                    )
            if self.args.max_steps > 0 and self.global_step >= self.args.max_steps:
                break

开始训练的示意图
在这里插入图片描述

tr_loss += self.training_step(model, inputs)->调用DAGN.forward()得到输出结果

    def training_step(self, model: nn.Module, inputs: Dict[str, Union[torch.Tensor, Any]]) -> torch.Tensor:
        """
        Perform a training step on a batch of inputs.

        Subclass and override to inject custom behavior.

        Args:
            model (:obj:`nn.Module`):
                The model to train.
            inputs (:obj:`Dict[str, Union[torch.Tensor, Any]]`):
                The inputs and targets of the model.

                The dictionary will be unpacked before being fed to the model. Most models expect the targets under the
                argument :obj:`labels`. Check your model's documentation for all accepted arguments.

        Return:
            :obj:`torch.Tensor`: The tensor with training loss on this batch.
        """
        if hasattr(self, "_training_step"):
            warnings.warn(
                "The `_training_step` method is deprecated and won't be called in a future version, define `training_step` in your subclass.",
                FutureWarning,
            )
            return self._training_step(model, inputs, self.optimizer)

        model.train()
        inputs = self._prepare_inputs(inputs)

        if self.args.fp16 and _use_native_amp:
            with autocast():
                outputs = model(**inputs)
                loss = outputs[0]
        else:
            outputs = model(**inputs) #调用DAGN中的forward()方法获得输出
            # We don't use .loss here since the model may return tuples instead of ModelOutput.
            loss = outputs[0] #

获得输出需要调用DAGN.py中的
class DAGN():
def forward():
# input_ids.size()=torch.Size([32,128])
flat_input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
flat_token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None

   flat_passage_mask = passage_mask.view(-1, passage_mask.size(-1)) if passage_mask is not None else None
   flat_question_mask = question_mask.view(-1, question_mask.size(-1)) if question_mask is not None else None

   flat_argument_bpe_ids = argument_bpe_ids.view(-1, argument_bpe_ids.size(-1)) if argument_bpe_ids is not None else None
   flat_domain_bpe_ids = domain_bpe_ids.view(-1, domain_bpe_ids.size(-1)) if domain_bpe_ids is not None else None  
   flat_punct_bpe_ids = punct_bpe_ids.view(-1, punct_bpe_ids.size(-1)) if punct_bpe_ids is not None else None
   # 获取roberta模型最后一层的输出利用input和attention_mask;last_hidden_state=torch.Size([32,128,768])# BERT输入的所有token经过BERT编码后，会在每个位置输出一个大小为 hidden_size（在 BERT-base中是 768）的向量。
   last_hidden_state, p = self.roberta(flat_input_ids, attention_mask=flat_attention_mask, token_type_ids=None, return_dict = False)
   #last_hidden_state, p = self.bert(flat_input_ids, attention_mask=flat_attention_mask, token_type_ids=None, return_dict = False)
   sequence_output = last_hidden_state#torch.Size([32,128,768])
   pooled_output = p #torch.Size([32.768])

在这里插入图片描述
上述文件中指明了forward()过程中使用的roberta模型输出得到的基本内容：

@dataclass
class BaseModelOutputWithPooling(ModelOutput):
    """
    Base class for model's outputs that also contains a pooling of the last hidden states.

    Args:
        last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
            Sequence of hidden-states at the output of the last layer of the model.
        pooler_output (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, hidden_size)`):
            Last layer hidden-state of the first token of the sequence (classification token)
            further processed by a Linear layer and a Tanh activation function. The Linear
            layer weights are trained from the next sentence prediction (classification)
            objective during pretraining.
        hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
            Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
            of shape :obj:`(batch_size, sequence_length, hidden_size)`.

            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
            Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
            :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.

            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
            heads.
    """

    last_hidden_state: torch.FloatTensor
    pooler_output: torch.FloatTensor = None
    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
    attentions: Optional[Tuple[torch.FloatTensor]] = None

图像解释：
在这里插入图片描述

在这里插入图片描述

dagn的forward()中逐步计算损失函数

在dagn.py的第464行，在计算损失的时候，github中的原始代码调用错了算是计算的函数，改为
loss2 = self.get_con_lossL(positive_keys2, negative_keys2)

trainer中的早停策略和评估策略

早停（Early Stopping）通常需要和评估策略（evaluation_strategy）一起设置，以确保模型在每个评估周期后检查其性能并决定是否停止训练。早停机制依赖于评估结果，因此必须定期进行评估才能触发早停逻辑。

    # 早停策略和聘雇策略需要结合使用，不然训练过程中不知道在什么时候停止训练
    early_stopping_patience: int = field(#早停策略
        default=1, 
        metadata={"help": "Number of evaluations with no improvement after which training will be stopped."}
    )
    evaluation_strategy: str = field( #评估策略，每个epoch结束后运行，在测试集数据上评估模型
        default="epoch", #默认是none，训练过程中不评估 
        metadata={"help": "The evaluation strategy to adopt during training."}
    )
    metric_for_best_model: str = field(# 早停策略所用的指标
        default="pearson", 
        metadata={"help": "The metric to use to compare models."}
    )
    greater_is_better: bool = field(# 早停策略所用的指标# 指标越大越好（适用于 Pearson 相关系数）
        default=True, 
        metadata={"help": "Whether the better metric is greater or not."}
    )