Pytorch实战第二讲：线性回归与softmax

Susprin14138

2024-09-27

1 2	import random import torch

生成数据集
生成一个带有噪声的数据集矩阵
$^{10002}$
使用线性模型参数
,
以及不可约误差生成数据集以及标签
$$y=Xw+b+\epsilon$$

假设服从标准正态分布（均值为0，标准差为0.01）

#合成数据集
def synthetic_data(w,b,num_examples):
    '''生成y=w*X+b+噪声'''
    X=torch.normal(0,1,(num_examples,len(w)))
    y=torch.matmul(X,w)+b
    y+=torch.normal(0,0.01,y.shape)
    return X,y.reshape((-1,1))

true_w=torch.tensor([2,-3.4])
true_b=4.2
features,labels=synthetic_data(true_w,true_b,1000)

1	print(f"features:{features[0]},labels:{labels[0]}")

features:tensor([-0.9688, -1.7488]),labels:tensor([8.2107])

读取数据集
定义函数data_iter，接收批量大小，特征矩阵以及标签向量作为输入，对数据集每次选取一个小批次样本来更新模型

def data_iter(batch_size,features,labels):
    num_examples=len(features)
    indices=list(range(num_examples))
    random.shuffle(indices)
    for i in range(0,num_examples,batch_size):
        batch_indices=torch.tensor(indices[i:min(i+batch_size,num_examples)])
        yield features[batch_indices],labels[batch_indices]

for X,y in data_iter(10,features,labels):
    print(X,'\n',y)
    break

tensor([[-0.1790, -2.1731],
        [-1.2023,  2.9576],
        [-0.9190,  1.0362],
        [-1.0460, -0.9962],
        [-2.3486, -0.2128],
        [ 1.1386,  0.8077],
        [ 0.0452,  0.6301],
        [-0.9169,  1.4352],
        [ 2.7504,  1.3977],
        [ 0.7046, -0.5593]]) 
 tensor([[11.2521],
        [-8.2572],
        [-1.1725],
        [ 5.5016],
        [ 0.2462],
        [ 3.7252],
        [ 2.1457],
        [-2.5164],
        [ 4.9346],
        [ 7.5260]])

初始化模型参数
在均值为0，方差为0.01的正态分布中采样随机数初始化权重w，
偏差b初始化为0

1 2	w=torch.normal(0,0.01,size=(2,1),requires_grad=True) b=torch.zeros(1,requires_grad=True)

定义模型

1 2	def my_linear(X,w,b): return torch.matmul(X,w)+b

损失函数：

1 2	def squared_loss(y_hat,y): return(y_hat-y.reshape(y_hat.shape))**2/2

sgd随机梯度下降：每次从数据集中随机抽取一个小批次，根据参数计算损失的梯度，朝着损失函数减少最快的方向更新参数。

sgd函数接受参数params，学习率lr，批次量batch_size作为输入
使用迭代器np_grad
每一步更新的大小由学习率的大小决定

因为计算的对象是一个批量样本的综合
为了排除步长大小对批量大小的选择的影响
将补偿大小除以batch_size来规范化步长

def sgd(params,lr,batch_size):
    with torch.no_grad():
        for param in params:
            param-=lr*param.grad/batch_size
            param.grad.zero_()#将梯度置零
            #默认情况下，新的梯度会在旧的梯度上累加。

训练是一个循环迭代的过程
每一次循环包括：
1.初始化参数
2.重复以下训练直到完成
2.1计算梯度
2.2更新参数

在每一个迭代周期(epoch)中，使用data_iter遍历整个数据集

注意：num_epochs和学习率lr都是超参数，也就是需要通过实验人工调整很多次的参数

lr=0.03
num_epochs=3
loss=squared_loss

batch_size=10

for epoch in range(num_epochs):
    for X,y in data_iter(batch_size,features,labels):
        l=loss(my_linear(X,w,b),y)
        l.sum().backward()
        sgd([w, b],lr,batch_size)
    with torch.no_grad():
        train_1=loss(my_linear(features,w,b),labels)
        print(f'epoch{epoch+1},loss{float(train_1.mean()):f}')

epoch1,loss0.004560
epoch2,loss0.000067
epoch3,loss0.000056

1 2	print(f'w的估计误差：{true_w-w.reshape(true_w.shape)}') print(f'b的估计误差：{true_b-b}')

w的估计误差：tensor([5.7697e-05, 2.4891e-04], grad_fn=<SubBackward0>)
b的估计误差：tensor([0.0003], grad_fn=<RsubBackward1>)