Commit 285fdd1c authored by Cem Anil's avatar Cem Anil
Browse files

Add submitted code.

# LNets
Implementation and evaluation of Lipschitz neural networks (LNets).
# Installation
* Create a new conda environment and activate it:
conda create -n lnets python=3.5
conda activate lnets
* Install PyTorch, following instructions in ``.
* Install torchnet by:
pip install git+
* Navigate to the root of the project. Install the package, along with requirements:
python install
* Add project root to PYTHONPATH. One way to do this:
**Note on PyTorch version**: All the experiments were performed using PyTorch version 0.4, although the code is expected
to run using Pytorch 1.0.
# Models
Code that implements the core ideas presented in the paper are shown below.
├── models
│ └── acivations
│ └── "GroupSort activation. "
│ └── "MaxOut and MaxMin activations. "
│ └── layers
│ └── conv
│ └── "Conv layer with Bjork-orthonormalized filters. "
│ └── "Conv layer with L-infinity projected filters. "
│ └── dense
│ └── "Dense layer with Bjorck-orthonormalized weights. "
│ └── "Dense layer with l-infinity projected weights. "
│ └── "Dense layer with Parseval regularization. "
│ └── "Dense layer with spectral normalization. "
│ └── regularization
│ └──
│ └── "Penalizes the jacobian norm. Description in Appendix of paper. "
│ └── utils
│ └──
│ └── "Converts a Bjorck layer to a regular one for fast test time inference. "
│ └── "Specification of models for a variety of tasks. "
## Configuring Experiments
We strived to put as many variables as we could in a single configuration (json) file for each experiment.
Sample configuration files exist under:
* `lnets/tasks/adversarial/configs`: for adversarial robustness experiments.
* `lnets/tasks/classification/configs`: for classification experimnts.
* `lnets/tasks/dualnets/configs`: for Wasserstein distance estimation experiments.
* `lnets/tasks/gan/configs`: for training GANs.
We now describe the key moving parts in these configs and how to change them.
### Model Configuration
``: (string) Chooses the overall architecture and the task/training objective. `lnets/models/` contains
the commonly used model names. Two examples are:
* "dual_fc": Train a fully connected model, under the dual Wasserstein objective.
* "classify_fc": Train a fully connected classifier.
`model.activation`: (string) Activation used throughout the network. One of "maxmin", "group_sort", "maxout", "relu", "tahn",
"sigmoid" or "identity" (i.e. no activation).
`model.linear.type`: (string) Chooses which linear layer type is going to be used. If the model is fully connected, the available
options are:
* "standard": The usual linear layer.
* "bjorck": Bjorck orthonormalized - all singular values equal to one.
* "l_inf_projected": Weight matrices are projected to the L-infinity ball.
* "spectral_normal": Use spectral normalization - largest singular value set to 1.
* "parseval_l2": Parseval regularized linear transformation.
If the architecture is fully convolutional, the available options are:
* "standard_conv2d": the standard convolutional layer,
* "bjorck_conv2d": Convolutional layers in which the filters are Bjorck orthonormalized,
* "l_inf_projected_conv2d": Convolutional layers in which the filters are projected to the L-infinity ball.
`model.layers`: (list) Contains how many neurons (or convolutional filters) there should be in each layer.
`model.groupings`: (list) This field is used for activations that perform operations on groups of neurons. Used for GroupSort,
MaxMin and MaxOut. Is a list specifying the grouping sizes for each layer. For example, setting to \[2, 3\] means the
activation should act on groups of 2 and 3 in the first and second layers, respectively.
`l_constant`: (integer) Scales the output of each layer by a certain amount such that the network output is scaled by
l_constant. Used to build K-Lipschitz networks out of 1-Lipschitz building blocks.
`per_update_proj` and `per_epoch_proj`: Some algorithms (such as Parseval networks) involve projecting the weights of
networks to a certain manifold after each training update. These fields let the user flexibly choose how often and with
which projection algorithm the weights should be projected. The supported projection algorithms are:
* "l_2": project to L2 ball.
* "l_inf_projected": project to the L-infinity ball.
By default, after-update and after-epoch updates are set to false.
### Running on GPU
If a GPU is available, we strongly encourage the users to turn on GPU training by turning on the related json field in
the experiment configs. In all experiments, set `"cuda": true` (except for the GAN experiments, for which set
`"gpu_mode": true`)
turn on the "cuda" field in the configurations. This speeds up
training models significantly - especially with Bjorck layers.
### Other Configurations
**Configuring optimizer**: Adam, standard SGD, nesterov momentum and AggMo are supported. Since most of the fields
in the optimizer configurations are self-explanatory, we leave it for the user to make use of the existing optimizer
configurations pushed in this repo.
**Miscellaneous**: Other fields control other aspects of training, such as IO settings, enabling cuda, logging,
visualizing results etc.
Other task specific configs will be described below under their corresponding titles.
# Tasks
Four tasks are explored: Wasserstein Distance estimation, adversarial robustness, GAN training and classification.
### Wasserstein Distance Estimation
**Configuring Distributions**: The `distrib1` and `distrib2` fields are intended to be used to configure the probability
distributions that will be used in the Wasserstein Distance estimation tasks. Currently, configs for
`multi_spherical _shell` (a distribution consisting of multiple spherical shells living in high dimensions) and
`gan_sampler` (samples from the empirical and generator distribution of a GAN) exist.
#### Quantifying Expressivity using Synthetic Distributions
By using synthetic distributions whose Wasserstein distance and its accompanying dual surface we can analytically
compute, we can quantify how expressive a Lipschitz architecture is. The closer the architecture can approximate the
correct Wasserstein distance, the more expressive it is.
* Approximating Absolute Value
python ./lnets/tasks/dualnets/mains/ ./lnets/tasks/dualnets/configs/absolute_value_experiment.json
* Approximating Three Cones
python ./lnets/tasks/dualnets/mains/ ./lnets/tasks/dualnets/configs/three_cones_experiment.json
* Approximating High Dimensional Cones
python ./lnets/tasks/dualnets/mains/ ./lnets/tasks/dualnets/configs/high_dimensional_cone_experiment.json
#### Wasserstein Distance between GAN Generator and Empirical Distributions
First, we need to train a GAN so that we can use its generator network for the Wasserstein Distance estimation
* GAN training
(defaults to training WGAN on MNIST)
python ./lnets/tasks/gan/mains/ ./lnets/tasks/gan/configs/train_GAN.json
The GAN type and the training set (along with other training hyperparameters) can be changed:
`gan_type`: One of "WGAN", "WGAN_GP" or "LWGAN" (where the discriminator consists of this paper's contributions -
more on this later)
`dataset`: One of "mnist", "fashion-mnist", "cifar10", "svhn", "stl10" "lsun-bed"
* Estimating Wasserstein Distance
python ./lnets/tasks/dualnets/mains/ ./lnets/tasks/dualnets/configs/estimate_wde_gan.json
In order to sample from the GAN trained in the above step, we need to modify the config used for wasserstein distance
`distrib1.gan_config_json_path`: Path to the gan training config used in the first step.
One can then modify the model to see which Lipschitz architectures obtain a tighter lower bound on the Wasserstein
distance between the generator and empirical data distribution.
### Training LWGAN (Lipschitz WGANs)
We can use the same WGAN training methodology, but build a discriminator network comprised of our methods (i.e. Bjorck
orthonormalized linear transformations and GroupSort activations)
#### Training LWGAN:
python ./lnets/tasks/gan/mains/ ./lnets/tasks/gan/configs/train_LWGAN.json
The the training set (along with other training hyperparameters) can be changed:
`dataset`: One of "mnist", "fashion-mnist", "cifar10", "svhn", "stl10" "lsun-bed"
### Classification
#### Classification on Standard Datasets
* Training a standard, fully connected classifier.
python ./lnets/tasks/classification/mains/ ./lnets/tasks/classification/configs/standard/fc_classification.json
* Training Bjorck Lipschitz classifier
python ./lnets/tasks/classification/mains/ ./lnets/tasks/classification/configs/standard/fc_classification_bjorck.json -o model.linear.bjorck_iter=3
Note that we use few bjorck iterations for this training script. Lipschitz-ness will not be strictly enforced so
we do additional finetuning afterwards.
* Orthonormal finetuning
python ./lnets/tasks/classification/mains/ --model.exp_path=<>
#### Classification with Small Data
* Generating data indices: Generate which samples in the dataset will be used for training.
python ./lnets/tasks/classification/mains/ mnist --data.root "data/small_mnist" --data.class_count 10 --per_class_count 100 --val_size 5000
* Training small data classifier.
python ./lnets/tasks/classification/mains/ ./lnets/tasks/classification/configs/small_mnist/lenet_bjorck.json
### Adversarial Robustness
For the robustness experiments we trained both the Bjorck orthonormal networks and the L-infinity max-margin networks.
* Training L-Inf Lipschitz margin network
python ./lnets/tasks/classification/mains/ ./lnets/tasks/classification/configs/standard/fc_classification_l_inf_margin.json
* Evaluating robustness of trained classifier
python ./lnets/tasks/adversarial/mains/ --model.exp_path="root/of/above/experiment/results" --output_root="outs/adv_robustness/mnist_l_inf_margin"
## Code References
* ResNet Implementation: Largely based on github/kuangliu -
* GAN training pipeline: Based on and refactored from
import torchvision.transforms as transforms
def get_data_transforms(config):
# train_transform = None
test_transform = None
if == 'cifar':
train_transform, test_transform = get_cifar_transform(config)
elif == 'imagenet':
train_transform, test_transform = get_imagenet_transform(config)
train_transform = transforms.ToTensor()
# Make sure to turn the input images into PyTorch tensors.
if test_transform is None:
test_transform = transforms.ToTensor()
return train_transform, test_transform
def get_cifar_transform(config):
normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465],
std=[0.2023, 0.1994, 0.2010])
train_transform = transforms.Compose([
transforms.RandomCrop(32, padding=4),
test_transform = transforms.Compose([
return train_transform, test_transform
def get_imagenet_transform(config):
normalize = transforms.Normalize(,
train_transform = transforms.Compose([
test_transform = transforms.Compose([
return train_transform, test_transform
from import get_datasets
from import save_indices
import argparse
import os
from munch import Munch
def main(opt): = Munch(type='none')
indices_path = os.path.join(,
train_data, _, _ = get_datasets(opt)
save_indices(train_data, indices_path, opt.per_class_count,, opt.val_size)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Generate data indices. ')
parser.add_argument('', type=str, metavar='MODELPATH',
help="location of pretrained model weights to evaluate")
parser.add_argument('--data.root', type=str, help='output directory to which results should be saved')
parser.add_argument('--data.class_count', type=int, help='total number of classes in dataset')
parser.add_argument('--per_class_count', type=int, help="How many training data points per class")
parser.add_argument('--val_size', type=int, help="Total number of validation points")
args = vars(parser.parse_args())
opt = {}
for k, v in args.items():
cur = opt
tokens = k.split('.')
for token in tokens[:-1]:
if token not in cur:
cur[token] = {}
cur = cur[token]
cur[tokens[-1]] = v
import os
from import Subset, DataLoader
import torchvision.datasets as datasets
from import get_data_transforms
from import load_indices
def get_datasets(config):
data_name = config['data']['name'].lower()
path = os.path.join(config['data']['root'], data_name)
train_transform, test_transform = get_data_transforms(config)
train_data_args = dict(download=True, transform=train_transform)
val_data_args = dict(download=True, transform=test_transform)
test_data_args = dict(train=False, download=True, transform=test_transform)
if data_name == 'mnist':
train_data = datasets.MNIST(path, **train_data_args)
val_data = datasets.MNIST(path, **val_data_args)
test_data = datasets.MNIST(path, **test_data_args)
elif data_name == 'cifar10':
train_data = datasets.CIFAR10(path, **train_data_args)
val_data = datasets.CIFAR10(path, **val_data_args)
test_data = datasets.CIFAR10(path, **test_data_args)
elif data_name == 'cifar100':
train_data = datasets.CIFAR100(path, **train_data_args)
val_data = datasets.CIFAR100(path, **val_data_args)
test_data = datasets.CIFAR100(path, **test_data_args)
elif data_name == 'fashion-mnist':
train_data = datasets.FashionMNIST(path, **train_data_args)
val_data = datasets.FashionMNIST(path, **val_data_args)
test_data = datasets.FashionMNIST(path, **test_data_args)
elif data_name == 'imagenet-torchvision':
train_data = datasets.ImageFolder(os.path.join(path, 'train'), transform=train_transform)
val_data = datasets.ImageFolder(os.path.join(path, 'valid'), transform=test_transform)
# Currently not loaded.
test_data = None
raise NotImplementedError('Data name %s not supported' % data_name)
return train_data, val_data, test_data
def build_loaders(config, train_data, val_data, test_data):
data_name = config['data']['name'].lower()
batch_size = config['optim']['batch_size']
num_workers = config['data']['num_workers']
if config['data']['indices_path'] is not None:
train_indices, val_indices = load_indices(config['data']['indices_path'], config['data']['per_class_count'])
train_data = Subset(train_data, train_indices)
val_data = Subset(val_data, val_indices)
elif data_name != 'imagenet-torchvision':
# Manually readjust train/val size for memory saving.
data_size = len(train_data)
train_size = int(data_size * config['data']['train_size'])
train_data.train_data = train_data.train_data[:train_size]
train_data.train_labels = train_data.train_labels[:train_size]
if config['data']['train_size'] != 1:
val_data.train_data = val_data.train_data[train_size:]
val_data.train_labels = val_data.train_labels[train_size:]
val_data = None
loaders = {
'train': DataLoader(train_data, batch_size=batch_size, shuffle=True, num_workers=num_workers),
'validation': DataLoader(val_data, batch_size=batch_size, num_workers=num_workers),
'test': DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)
return loaders
def load_data(config):
train_data, val_data, test_data = get_datasets(config)
return build_loaders(config, train_data, val_data, test_data)
import numpy as np
def get_small_data_indices(dataset, total_per_class, class_count, val_size):
total_points = len(dataset)
if total_per_class * class_count + val_size > total_points:
raise Exception('More data points requested than is in data')
random_indices = np.random.permutation(total_points)
small_data_indices = {}
val_indices = []
for c in range(class_count):
small_data_indices[c] = []
for idx in random_indices:
_, y = dataset[idx]
y = int(y.item())
if len(small_data_indices[y]) < total_per_class:
elif len(val_indices) < val_size:
if all([len(small_data_indices[c]) == total_per_class for c in range(class_count)]):
if len(val_indices) == val_size:
if not all([len(small_data_indices[c]) == total_per_class for c in range(class_count)]):
raise Warning('Uneven class counts in small data indices')
return np.array([small_data_indices[c] for c in
range(class_count)]).astype(np.int32).flatten(), np.array(val_indices).astype(np.int32)
from import get_small_data_indices
import numpy as np
import os
def save_indices(dataset, indices_path, per_class_count, total_class_count, val_size):
train_indices, val_indices = get_small_data_indices(dataset, per_class_count, total_class_count, val_size)
np.savetxt(os.path.join(indices_path, "train_indices_{}.txt".format(per_class_count)), train_indices)
np.savetxt(os.path.join(indices_path, "val_indices_{}.txt".format(per_class_count)), val_indices)
def load_indices(path, per_class_count):
train_indices = os.path.join(path, "train_indices_{}.txt".format(per_class_count))
val_indices = os.path.join(path, "val_indices_{}.txt".format(per_class_count))
return np.loadtxt(train_indices, dtype=np.int32), np.loadtxt(val_indices, dtype=np.int32)
from lnets.models.architectures import *
from lnets.models.model_types import *
from lnets.models.layers import *
def register_model(model_name):
def decorator(f):
MODEL_REGISTRY[model_name] = f
return f
return decorator
def get_model(config):
model_name = config['model']['name']
if model_name in MODEL_REGISTRY:
return MODEL_REGISTRY[model_name](config)
raise ValueError("Unknown model {:s}".format(model_name))
# Wasserstein Distance Estimation.
def load_fc_dual(config):
model = FCNet(config.model.layers, config.distrib1.dim, config.model.linear.type, config.model.activation,
bias=config.model.linear.bias, config=config)
return DualOptimModel(model)
def load_conv_dual(config):
model = FullyConv2D(config.distrib1.dim, config.model.channels, config.model.kernels, config.model.strides,
linear_type=config.model.linear.type, activation=config.model.activation, config=config)
return DualOptimModel(model)
# Classification.
def load_classify_fc(config):
model = FCNet(config.model.layers,, config.model.linear.type, config.model.activation,
bias=config.model.linear.bias, config=config)
return ClassificationModel(model)
def load_classify_fc_dropout(config):
model = FCNet(config.model.layers,, config.model.linear.type, config.model.activation,
bias=config.model.linear.bias, config=config, dropout=True)
return ClassificationModel(model)
def load_classify_fc_spec_jac(config):
model = FCNet(config.model.layers,, config.model.linear.type, config.model.activation,
bias=config.model.linear.bias, config=config)
return JacSpecClassificationModel(model, config['model']['sn_reg'], config['cuda'])
def load_classify_fc_margin(config):
model = FCNet(config.model.layers,, config.model.linear.type, config.model.activation,
bias=config.model.linear.bias, config=config)
return MarginClassificationModel(model, config)
def load_classify_fc_hinge(config):
model = FCNet(config.model.layers,, config.model.linear.type, config.model.activation,
bias=config.model.linear.bias, config=config)
return HingeLossClassificationModel(model, config)
def load_lenet_classify(config):
model = LeNet(, config.model.output_dim, config.model.linear.type, config.model.activation, config.model.dropout_on,
return ClassificationModel(model)
def CifarResNet32(config):
block_config = {
"num_blocks": [5, 5, 5],
"num_channels": [16, 32, 64],
"width": 1,
"pool_size": 8
return ClassificationModel(ResNet(BasicBlock, block_config, config['data']['class_count']))
def CifarWideResNet32(config):
block_config = {
"num_blocks": [5, 5, 5],
"num_channels": [16, 32, 64],
"width": 10,
"pool_size": 8
return ClassificationModel(ResNet(BasicBlock, block_config, config['data']['class_count']))
from lnets.models.activations.base_activation import Activation
from lnets.models.activations.maxout import Maxout, MaxMin
from lnets.models.activations.identity import Identity
from lnets.models.activations.group_sort import GroupSort
import torch.nn as nn
class Activation(nn.Module):
def __init__(self):
super(Activation, self).__init__()
def forward(self, x):
raise NotImplementedError
import numpy as np
import torch.nn as nn
class GroupSort(nn.Module):
def __init__(self, num_units, axis=-1):