Constructing parameter groups in pytorch

Question

In the torch.optim documentation, it is stated that model parameters can be grouped and optimized with different optimization hyperparameters. It says that

For example, this is very useful when one wants to specify per-layer learning rates:
optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)
This means that model.base’s parameters will use the default learning rate of 1e-2, model.classifier’s parameters will use a learning rate of 1e-3, and a momentum of 0.9 will be used for all parameters.

I was wondering how to define such groups that have parameters() attribute. What came to my mind was something in the form of

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.base()
        self.classifier()

        self.relu = nn.ReLU()

    def base(self):
        self.fc1 = nn.Linear(1, 512)
        self.fc2 = nn.Linear(512, 264)

    def classifier(self):
        self.fc3 = nn.Linear(264, 128)
        self.fc4 = nn.Linear(128, 964)

    def forward(self, y0):

        y1 = self.relu(self.fc1(y0))
        y2 = self.relu(self.fc2(y1))
        y3 = self.relu(self.fc3(y2))

        return self.fc4(y3)

How should I modify the snippet above to be able to get model.base.parameters()? Is the only way to define a nn.ParameterList and explicitly add weights and biases of the desired layers to that list? What is the best practice?

Ivan · Accepted Answer · 2021-11-13 16:15:15Z

11

I will show three approaches to solving this. In the end though, it comes down to personal preference.

- Grouping parameters with `nn.ModuleDict`.

I noticed here an answer using nn.Sequential to group the layers which allow to target different sections of the model using the parameters attribute of nn.Sequential. Indeed base and classifier might be more than sequential layers. I believe a more general approach is to leave the module as is, but instead, initialize an additional nn.ModuleDict module which will contain all parameters ordered by the optimization group in separate nn.ModuleLists:

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()

        self.fc1 = nn.Linear(1, 512)
        self.fc2 = nn.Linear(512, 264)
        self.fc3 = nn.Linear(264, 128)
        self.fc4 = nn.Linear(128, 964)

        self.params = nn.ModuleDict({
            'base': nn.ModuleList([self.fc1, self.fc2]),
            'classifier': nn.ModuleList([self.fc3, self.fc4])})

    def forward(self, y0):
        y1 = self.relu(self.fc1(y0))
        y2 = self.relu(self.fc2(y1))
        y3 = self.relu(self.fc3(y2))
        return self.fc4(y3)

Then you can define your optimizer with:

optim.SGD([
    {'params': model.params.base.parameters()},
    {'params': model.params.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

Do note MyModel's parameters' generator won't contain duplicate parameters.

- Creating an interface for accessing parameter groups.

A different solution is to provide an interface in the nn.Module to separate the parameters into groups:

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()

        self.fc1 = nn.Linear(1, 512)
        self.fc2 = nn.Linear(512, 264)
        self.fc3 = nn.Linear(264, 128)
        self.fc4 = nn.Linear(128, 964)

    def forward(self, y0):
        y1 = self.relu(self.fc1(y0))
        y2 = self.relu(self.fc2(y1))
        y3 = self.relu(self.fc3(y2))
        return self.fc4(y3)

    def base_params(self):
        return chain(m.parameters() for m in [self.fc1, self.fc2])

    def classifier_params(self):
        return chain(m.parameters() for m in [self.fc3, self.fc4])

Having imported itertools.chain as chain.

Then define your optimizer with:

optim.SGD([
    {'params': model.base_params()},
    {'params': model.classifier_params(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

- Using child `nn.Module`s.

Lastly, you can define your module sections as submodules (here it comes down as the method as the nn.Sequential one, yet you can generalize this to any submodules).

class Base(nn.Sequential):
    def __init__(self):
        super().__init__(nn.Linear(1, 512),
                         nn.ReLU(),
                         nn.Linear(512, 264),
                         nn.ReLU())

class Classifier(nn.Sequential):
    def __init__(self):
        super().__init__(nn.Linear(264, 128),
                         nn.ReLU(),
                         nn.Linear(128, 964))

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()

        self.base = Base()
        self.classifier = Classifier()

    def forward(self, y0):
        features = self.base(y0)
        out = self.classifier(features)
        return out

Here again you can use the same interface as the first method:

optim.SGD([
    {'params': model.base.parameters()},
    {'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)

I would argue this is the best practice. However, it forces you to define each of your components into separate nn.Module, which can be a hassle when experimenting with more complex models.

edited Nov 13, 2021 at 16:15

answered Oct 30, 2021 at 10:13

Ivan

41.3k9 gold badges78 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Blade Over a year ago

Are you able to run the first method? OrderedDict seems to be an attribute of collections, not torch.nn. Then, how does model have attributes base and classifier?

Ivan Over a year ago

@Blade Indeed, sorry for the confusion. in the first method model.params is a collections.OrderedDict and should be accessed via strings, not attributes. Thank you for taking the time to edit the answer.

Blade Over a year ago

Thank you! In case I also have a nn.Parameter that want to add to one item in the dictionary, what would be the minimal edit needed on the first method? e.g. how should I edit 'base': nn.ModuleList([self.fc1, self.fc2]) to include the new nn.Parameter?

Blade Over a year ago

Let me add that my own solution would be to change the nn.ModuleList to nn.ParameterList and add .weight and .bias for each layer. Can this been done more efficiently/cleaner?

GoodDeeds · Accepted Answer · 2021-10-29 21:10:26Z

1

You can use torch.nn.Sequential to define base and classifier. Your class definition can then be:

class MyModel(nn.Module):

    def __init__(self):
        super(MyModel, self).__init__()
        self.base = nn.Sequential(nn.Linear(1, 512), nn.ReLU(), nn.Linear(512,264), nn.ReLU())
        self.classifier = nn.Sequential(nn.Linear(264,128), nn.ReLU(), nn.Linear(128,964))

    def forward(self, y0):
        return self.classifier(self.base(y0))

Then, you can access parameters using model.base.parameters() and model.classifier.parameters().

answered Oct 29, 2021 at 21:10

GoodDeeds

8,6275 gold badges40 silver badges69 bronze badges

Collectives™ on Stack Overflow

Constructing parameter groups in pytorch

2 Answers 2

- Grouping parameters with `nn.ModuleDict`.

- Creating an interface for accessing parameter groups.

- Using child `nn.Module`s.

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

- Grouping parameters with nn.ModuleDict.

- Creating an interface for accessing parameter groups.

- Using child nn.Modules.

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related

- Grouping parameters with `nn.ModuleDict`.

- Using child `nn.Module`s.