🙌

李宏毅机器学习作业总结

最开始是4 5个月前刚入门深度学习的时候做过一次,但是当时感觉只是囫囵吞枣般的做完了,没有认真的思考过里面的细节。最近在做datawhale夏令营时,被模型创建和训练伤透,深感自己的调参技术相当垃圾,于是重新捡起李老的作业从头认真的做一遍。

Homework1

作业简介

一个回归问题,由若干不同症状的患者,根据他们的症状给出 covid-19 阳性的概率
一些标准:
notion image

调参记录

【Simple Baseline】

只需要将原始代码跑一下就好了
notion image

【Medium + Strong Baseline】

修改特征选择(选择默认的除前35个之外的,即不选择地区作为学习的特征):
def select_feat(train_data, valid_data, test_data, select_all=True):
    '''Selects useful features to perform regression'''
    y_train, y_valid = train_data[:,-1], valid_data[:,-1]
    raw_x_train, raw_x_valid, raw_x_test = train_data[:,:-1], valid_data[:,:-1], test_data

    if select_all:
        feat_idx = list(range(raw_x_train.shape[1]))
    else:
        feat_idx = list(range(35, raw_x_train.shape[1])) # TODO: Select suitable feature columns.

    return raw_x_train[:,feat_idx], raw_x_valid[:,feat_idx], raw_x_test[:,feat_idx], y_train, y_valid
再将select_all置为False:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
config = {
    'seed': 5201314,      # Your seed number, you can pick your lucky number. :)
    'select_all': False,   # Whether to use all features.
    'valid_ratio': 0.2,   # validation_size = train_size * valid_ratio
    'n_epochs': 5000,     # Number of epochs.
    'batch_size': 256,
    'learning_rate': 1e-5,
    'early_stop': 600,    # If model has not improved for this many consecutive epochs, stop training.
    'save_path': './models/model.ckpt'  # Your model will be saved here.
}
运行出来的结果:
notion image
奇怪的是,助教给出的hint说通过选择特定的特征可以达到medium baseline,而要达到strong baseline还需要改进模型。但是仅仅通过选择特征就可以通过strong baseline了。看来所谓“数据远远大于模型”不无道理。

【Boss Baseline】

根据上一个baseline的经验,特征选择非常重要,因此我们选择调库来选择最好的k个特征:
from sklearn.feature_selection import SelectKBest, f_regression

def select_feat(train_data, valid_data, test_data, select_all=True):
    '''Selects useful features to perform regression'''
    y_train, y_valid = train_data[:,-1], valid_data[:,-1]
    raw_x_train, raw_x_valid, raw_x_test = train_data[:,:-1], valid_data[:,:-1], test_data

    if select_all:
        feat_idx = list(range(raw_x_train.shape[1]))
    else:
        # TODO: Select suitable feature columns.
        selector = SelectKBest(score_func=f_regression, k=24)
        result = selector.fit(raw_x_train, y_train)
        idx = np.argsort(result.scores_)[::-1]
        feat_idx = list(np.sort(idx[:24]))

    return raw_x_train[:,feat_idx], raw_x_valid[:,feat_idx], raw_x_test[:,feat_idx], y_train, y_valid
关于这段代码的解释(我的理解): SeclectKBest是scikit中的一个函数,用于选择K个最好的特征,选择的标准则是由函数f_regression给出的,顾名思义,这是一个用于回归任务选择特征的函数。 定义好selector之后(selector = SelectKBest(score_func=f_regression, k=24),调用fit方法可以计算出所有特征的分数(result = selector.fit(raw_x_train, y_train) 紧接着按照从大到小的顺序排列,选出最前面的k个下标组成切片,提取特征
在不断的尝试下,k取20-24的时候效果最佳。
接着是调整模型的大小和参数,加入了LeakyReLU和BatchNorm,以及Dropout:
class My_Model(nn.Module):
    def __init__(self, input_dim):
        super(My_Model, self).__init__()
        # TODO: modify model's structure, be aware of dimensions.
        self.layers = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.LeakyReLU(0.2),
            nn.BatchNorm1d(64),
            nn.Dropout(0.1),
            nn.Linear(64, 16),
            nn.BatchNorm1d(16),
            nn.LeakyReLU(0.2),
            nn.Dropout(0.1),
            nn.Linear(16, 1)
        )

    def forward(self, x):
        x = self.layers(x)
        x = x.squeeze(1) # (B, 1) -> (B)
        return x
在训练方面,新增加学习率调整器,并且改用Adam:
    optimizer = torch.optim.Adam(model.parameters(), lr=config['learning_rate'] * 10, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=2,T_mult=2,eta_min=config['learning_rate'])
关于CosineAnnealingWarmRestarts: 意思是可以将学习率从初始值,在2, 4, 8... 个(T_0*(n - 1) * T_mult )ephoch之间,逐渐下降到eta_min。一直重复这个周期
此外,这里用AdamW的效果不如Adam,这里面可能有些东西没搞懂
最后是一些参数设置:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
config = {
    'seed': 5201314,      # Your seed number, you can pick your lucky number. :)
    'select_all': False,   # Whether to use all features.
    'valid_ratio': 0.2,   # validation_size = train_size * valid_ratio
    'n_epochs': 10000,     # Number of epochs.
    'batch_size': 256,
    'learning_rate': 1e-3,
    'early_stop': 1000,    # If model has not improved for this many consecutive epochs, stop training.
    'save_path': './models/model.ckpt'  # Your model will be saved here.
}
最后的结果:
notion image
只差一点就可以到boss baseline了,可能是特征的选择上还是没有做好。不过我也没有继续做了。

Homework2

作业简介

给定若干段录音,将它分解成不同的小段(frames),通过深度学习的方法来确定这一段录音中讲话人说的是哪一个字(音素)。总而言之,这是一个分类问题。
一些标准:
notion image

调参记录

【Simple Baseline】

  • *老样子,还是只需要把助教给的代码跑一遍就好“
    • notion image

【Medium Baseline】

根据提示,达到medium的条件是将合适的多个frames拼接在一起,这样可以最大限度的保留整个音素的信息。此外,还需要在模型中增加更多的层。
首先在block中添加更多的层,并且使用dropout和batchnorm
class BasicBlock(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(BasicBlock, self).__init__()

        self.block = nn.Sequential(
            nn.Linear(input_dim, output_dim),
            nn.BatchNorm1d(output_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(output_dim, output_dim * 2),
            nn.BatchNorm1d(output_dim * 2),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(output_dim * 2, output_dim),
            nn.BatchNorm1d(output_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
        )

    def forward(self, x):
        x = self.block(x)
        return x
接着改变隐藏层的大小,连接更多的frames(这里n取11):
# data prarameters
concat_nframes = 11              # the number of frames to concat with, n must be odd (total 2k+1 = n frames)
train_ratio = 0.8               # the ratio of data used for training, the rest will be used for validation

# training parameters
seed = 1213                        # random seed
batch_size = 512                # batch size
num_epoch = 10                   # the number of training epoch
learning_rate = 1e-4         # learning rate
model_path = './model.ckpt'     # the path where the checkpoint will be saved

# model parameters
input_dim = 39 * concat_nframes # the input dim of the model, you should not change the value
hidden_layers = 2               # the number of hidden layers
hidden_dim = 512                # the hidden dim
运行结果:
notion image
可以看到,效果并不理想。于是转而使用更深的网络,更宽的层(layers=6,dim=1024),并且连接更多的frames(n=17),
结果更好了:
notion image
思考题:课件中让我们做一个小实验,即更深,更窄的层好 还是 更浅,更宽的层好。照着课件上的思路,再根据上面的模型重新跑了一遍,这次layers=2, dim=1750,结果明显好于上面的模型:

【Strong Baseline】

我们根据上面的思路,先进一步加深模型,取layers=12, dim=1024。此外,在模型输出最后一层加上softMax,但是
结果却出奇的差:
notion image
于是想到会不会是softMax的问题,于是去掉后重新做实验,
发现效果变好了
,证明确实是softmax的问题:
notion image
在查阅资料之后,发现其实crossentry的损失函数是默认加了一层softMax的,所以如果在模型中再加一层的话会导致模型难以收敛。

【Boss Baseline】

助教提示的slides里写道,如果需要过boss baseline的话,需要用到RNN。我这里首先想到的是用LSTM。根据我之前看到的一篇文章并按照这个顺序来从头构建这个模型,正好实践一下。 根据第一条建议,我构建出了以下模型:
class Classifier(nn.Module):
    def __init__(self, input_dim, output_dim=41, hidden_layers=1, hidden_dim=256):
        super(Classifier, self).__init__()

        self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers=hidden_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x, _ = self.lstm(x)
        x = self.fc(x)
        return x
其中layers=2,dim=1024,并且成功在[10000, 2000]的数据集上过拟合:
notion image
这里我还尝试了layers=2,3, dim=512,1024, 2048的所有排列组合,最后发现这有选择的这个组合下train_loss的曲线最像log函数,跟建议所说一致。
再根据第五,六条建议,设定学习率为1e-4,使用Adam和CosineAnnealingLR。 再根据第七条建议,使用梯度裁剪...
参考的太多了!自己看博客吧...
跑了一晚上之后,
结果不尽人意:
notion image
之后在网上参考了大量的博客和文章,最后把模型继续加深:
import torch.nn as nn
import torch.nn.init as init

class BasicBlock(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(BasicBlock, self).__init__()

        self.block = nn.Sequential(
            nn.Linear(input_dim, output_dim),
            nn.ReLU(),
            nn.BatchNorm1d(output_dim),
            nn.Dropout(0.25),
        )

    def forward(self, x):
        x = self.block(x)
        return x


class Classifier(nn.Module):
    def __init__(self, input_dim, output_dim=41, hidden_layers=1, hidden_dim=256):
        super(Classifier, self).__init__()


        self.hidden_layers = 5
        self.hidden_dim = 512
        self.input_dim = 39

        self.lstm = nn.LSTM(self.input_dim, self.hidden_dim, num_layers=self.hidden_layers,dropout=0.25,batch_first=True,bidirectional=True)
        self.norm = nn.LayerNorm(self.hidden_dim * 2)
        self.relu = nn.ReLU()

        self.fc = nn.Sequential(
            BasicBlock(self.hidden_dim * 2, hidden_dim),
            *[BasicBlock(hidden_dim, hidden_dim) for _ in range(hidden_layers)],
            nn.Linear(hidden_dim, output_dim),
        )

        self.dropout = nn.Dropout(0.25)
        self.init_weights()

    def init_weights(self):
        for name, param in self.lstm.named_parameters():
            if 'weight_ih' in name:  # input to hidden weights
                init.xavier_uniform_(param.data)
            elif 'weight_hh' in name:  # hidden to hidden weights
                init.orthogonal_(param.data)
            elif 'bias' in name:  # biases
                init.zeros_(param.data)
            else:
                init.he_uniform_(param.data)

    def forward(self, x):
        x = x.view(x.shape[0], concat_nframes, 39)
        x, _ = self.lstm(x)
        x = x[:, -1]
        x = self.relu(x)
        x = self.norm(x)
        x = self.dropout(x)
        x = self.fc(x)
        return x
再根据第二条建议,在[60000, 3000] 和 [80000, 4000] 的小数据集上分别调参,最后确定了超参数:
# data prarameters
concat_nframes = 81              # the number of frames to concat with, n must be odd (total 2k+1 = n frames)
train_ratio = 0.95               # the ratio of data used for training, the rest will be used for validation

# training parameters
seed = 1213                        # random seed
batch_size = 256                # batch size
num_epoch = 20                   # the number of training epoch
learning_rate = 2e-4         # learning rate
model_path = './model.ckpt'     # the path where the checkpoint will be saved

# model parameters
input_dim = 39 * concat_nframes # the input dim of the model, you should not change the value
hidden_layers = 4               # the number of hidden layers
hidden_dim = 1024                # the hidden dim

optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-5)
scheduler = torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=2,T_mult=2,eta_min=0.1 * learning_rate)
得到的loss曲线如下所示:
notion image
可以看到,在这个参数下模型拟合的不错。最终在跑了20ephoch(约用了16个小时),得到了
最后的结果
notion image
可惜的是仍然没有过boss baseline。loss曲线如下:
notion image
可以看到loss在最后并没有完全收敛(甚至train loss 还没有超过 val loss),于是决定再多跑几个epoch。 在进行多5轮的训练后,发现模型已经收敛了,再次提交效果
并没有得到提升
notion image
感觉很可惜,毕竟只差一点点了。但是从头再训练一次花费的时间太多了,而且对学习没有太大的提升了,于是就先这样了吧!

Homework3

作业简介

利用CNN对食物的图片进行分类,一共有11个不同的类别
一些标准:
notion image

调参记录

【Simple baseline】

老样子,跑通示例代码就行:
notion image

【Medium baseline】

根据提示,我们需要先做一些图像增广,这里顺便把Report1在这里记录下来:
homework_tfm = transforms.Compose([transforms.RandomGrayscale(),
                transforms.RandomResizedCrop(128,(0.1, 1),(0.5, 2)),
                transforms.RandomHorizontalFlip(),
                transforms.RandomVerticalFlip(),
                transforms.ColorJitter(0.5, 0.5, 0.5, 0.3),
                transforms.GaussianBlur(7)])
init = transforms.Resize((128, 128))
img = Image.open('/kaggle/input/ml2023spring-hw3/train/0_0.jpg')
img = init(img)
display(img)
for _ in range(5):
    display(homework_tfm(img))
效果如下
notion image
说实话,这变换之后我看着都费劲,不知道机器真的能看懂吗。。。
跑了70多个epoch之后,没有过线
notion image
感觉是自己的图像变换有问题,在网上找了一些资料(https://zhuanlan.zhihu.com/p/430563265)后,选择了以下的方案:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224, scale=(0.08, 1.0), ratio=(3. / 4., 4. / 3.)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    normalize
 ])

test_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    normalize,
 ])
结果有非常显著的提升
notion image

【Strong Baseline】

根据提示,选一个定义好的模型来训练,我这里使用的是ResNet18。效果有,但不多
notion image
然后我继续尝试了ResNet34以及ResNet50,但是也只有微弱的提升:
notion image
小插曲:在最初定义模型的时候,由于官方文档没写需要给定num_classes参数,所以我直接忽略掉了这一项,但是没想到num_classes参数的默认值是1000!也就是意味着我上图跑的模型都是以1000类为目标的。在发现这点后,我立马去改了模型的定义加上了参数,重新训练了,结果居然大差不差,但是也是接近Strong baseline了:

【Boss Baseline】

最戏剧性的一幕是,当我想继续在Strong baseline的基础上选择更好的模型时,我选择了efficient net b3,但是结果却直接过了Boss baseline
notion image
于是我翻阅了efficient net的原始论文,使用了更强大的b4模型继续实验:
notion image
在private上也获得了提升。
在真正强大的模型面前,所有的cross validation和TTA这些技巧都显得微不足道啊。。
于是就这样稀里糊涂的过了Boss baseline,直接去下一个任务了^^

Homework4

作业简介

进行多类分类 (Multiclass Classification) 的说话人识别 (Speaker Identification)。目标是通过给定的语音数据来预测说话人的身份类别。在该任务中,您需要基于语音信号的特征,建立模型来识别不同说话人的身份。
一些标准:
notion image

调参记录

【Simple Baseline】

跑通原始代码即可:
notion image

【Medium/Strong Baseline】

根据作业提示,需要调整transformer中的self-attention的层数以及隐藏层的大小,这里我选择了attention is all you need原始论文中的设置,代码如下:
class Classifier(nn.Module):
	def __init__(self, d_model=512, n_spks=600, dropout=0.1):
		super().__init__()
		# Project the dimension of features from that of input into d_model.
		self.prenet = nn.Linear(40, d_model)
		# TODO:
		#   Change Transformer to Conformer.
		#   https://arxiv.org/abs/2005.08100
		self.encoder_layer = nn.TransformerEncoderLayer(
			d_model=d_model, dim_feedforward=2048, nhead=8, batch_first=True
		)
		self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=8)

		# Project the the dimension of features from d_model into speaker nums.
		self.pred_layer = nn.Sequential(
			nn.Linear(d_model, d_model),
			nn.Sigmoid(),
			nn.Linear(d_model, n_spks),
		)
结果是距离medium baseline还有一定的距离:
notion image
于是反正是调参,我就去看了一个比较好的自动调参的框架:optuna, 并使用默认的参数搜索方法和中值剪枝进行了优化,选出了一组比较好的超参数:
class Classifier(nn.Module):
	def __init__(self, d_model=1024, n_spks=600, dropout=0.4):
		super().__init__()
		# Project the dimension of features from that of input into d_model.
		self.prenet = nn.Linear(40, d_model)
		# TODO:
		#   Change Transformer to Conformer.
		#   https://arxiv.org/abs/2005.08100
		self.encoder_layer = nn.TransformerEncoderLayer(
			d_model=d_model, dim_feedforward=512, nhead=16, batch_first=True
		)
		self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=8)

		# Project the the dimension of features from d_model into speaker nums.
		self.pred_layer = nn.Sequential(
			nn.Linear(d_model, d_model * 2),
			nn.Sigmoid(),
            nn.Dropout(dropout),
			nn.Linear(d_model * 2, n_spks),
		)
config = {
        "data_dir": "/kaggle/input/ml2023springhw4/Dataset",
        "save_path": "model.ckpt",
        "batch_size": 32,
        "n_workers": 2,
        "valid_steps": 2000,
        "warmup_steps": 2000,
        "save_steps": 10000,
        "total_steps": 100000,
}

criterion = nn.CrossEntropyLoss()
optimizer = AdamW(model.parameters(), lr=4e-5, weight_decay=7e-8)
scheduler = get_cosine_schedule_with_warmup(optimizer, warmup_steps, total_steps)
结果非常好,居然一下就过了Strong Baseline:
notion image

【Boss Baseline】

根据助教的提示,需要将transformer更改成conformer。这里我使用了GitHub上某位大佬用pytorch写的conformer,pip安装后直接import即可。并且同样的使用optuna进行超参数搜索,代码和参数设置如下:
from conformer import Conformer
class Classifier(nn.Module):
    def __init__(self, d_model=512, n_spks=600, dropout=0.3, nhead=16, ff_mult=4, conv_expansion_factor=8, num_layers=4):
        super().__init__()
        # Project the dimension of features from that of input into d_model.
        self.prenet = nn.Linear(40, d_model)
        # TODO:
        #   Change Transformer to Conformer.
        #   https://arxiv.org/abs/2005.08100
        # self.encoder_layer = nn.TransformerEncoderLayer(
        # 	d_model=d_model, dim_feedforward=dim_feedforward, nhead=nhead, batch_first=True
        # )
        # self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
        self.conformer_block = Conformer(dim=d_model, depth=num_layers, dim_head=(d_model//nhead), heads=nhead,
                                      ff_mult=ff_mult, conv_expansion_factor=conv_expansion_factor, attn_dropout=dropout,
                                      ff_dropout=dropout, conv_dropout=dropout)
config = {
        "data_dir": "/kaggle/input/ml2023springhw4/Dataset",
        "save_path": "model.ckpt",
        "batch_size": 32,
        "n_workers": 2,
        "valid_steps": 2000,
        "warmup_steps": 2000,
        "save_steps": 10000,
        "total_steps": 100000,
}

criterion = nn.CrossEntropyLoss()
optimizer = AdamW(model.parameters(), lr=4e-5, weight_decay=7e-8)
scheduler = get_cosine_schedule_with_warmup(optimizer, warmup_steps, total_steps)
最后结果直接到达了Boss Baseline!其他的技巧(如AMSoftmax)也全都不用使用了:
notion image
 

总结

由于hw5及以后的作业kaggle上疑似不让报名了,我也就先做到这里了!