Our goal is to understand why the robustness drops after conducting
adversarial training for too long. Although this phenomenon is commonly
explained as overfitting, our analysis suggest that its prim
The quality of artificially generated texts has considerably improved with
the advent of transformers. The question of using these models to generate
learning data for supervised learning tasks natura
Multi-head attention plays a crucial role in the recent success of
Transformer models, which leads to consistent performance improvements over
conventional attention in various applications. The popul
Pre-training is a dominant paradigm in computer vision. For example,
supervised ImageNet pre-training is commonly used to initialize the backbones
of object detection and segmentation models. He et al
The deep neural networks used in modern computer vision systems require
enormous image datasets to train them. These carefully-curated datasets
typically have a million or more images, across a thousa
Self-training and unsupervised pre-training have emerged as effective
approaches to improve speech recognition systems using unlabeled data. However,
it is not clear whether they learn similar pattern
This paper introduces an information theoretic co-training objective for
unsupervised learning. We consider the problem of predicting the future. Rather
than predict future sensations (image pixels or
Spiking neural networks are a promising approach towards next-generation
models of the brain in computational neuroscience. Moreover, compared to
classic artificial neural networks, they could serve a
Adversarial training algorithms have been proven to be reliable to improve
machine learning models' robustness against adversarial examples. However, we
find that adversarial training algorithms tend
At the heart of deep learning we aim to use neural networks as function
approximators - training them to produce outputs from inputs in emulation of a
ground truth function or data creation process. I
Some of the hardest problems in deep learning can be solved with the combined
effort of many independent parties, as is the case for volunteer computing and
federated learning. These setups rely on hi
Federated learning (fl) systems conduct adversarial training when a quorum of workers could be completely malicious.
We model an attacker who poisons the model to insert a weakness into the adversarial training such that the model displays apparent adversarial robustness, while the attacker can exploit the inserted weakness to bypass the adversarial training and force the model to misclassify adversarial examples.
In recent years, semi-supervised algorithms have received a lot of interest
in both academia and industry. Among the existing techniques, self-training
methods have arguably received more attention in
Recent work has demonstrated that pre-training in-domain language models can
boost performance when adapting to a new domain. However, the costs associated
with pre-training raise an important questio
Lottery tickets (LTs) is able to discover accurate and sparse subnetworks
that could be trained in isolation to match the performance of dense networks.
Ensemble, in parallel, is one of the oldest tim
Overparameterized deep networks have the capacity to memorize training data
with zero training error. Even after memorization, the training loss continues
to approach zero, making the model overconfid
Inspired by its success in natural language processing and computer vision,
pre-training has attracted substantial attention in cheminformatics and
bioinformatics, especially for molecule based tasks.