As one of the most popular sequence-to-sequence modeling approaches for
speech recognition, the RNN-Transducer has achieved evolving performance with
more and more sophisticated neural network models
We present recd (recommendation deduplication), a suite of end-to-end infrastructure optimizations across the deep learning recommendation model(dlrm) training pipeline.
Recd addresses immense storage, preprocessing, and training overheads caused by feature duplication inherent in industry-scaledlrm training datasets.
In the twilight of Moore's law, GPUs and other specialized hardware
accelerators have dramatically sped up neural network training. However,
earlier stages of the training pipeline, such as disk I/O a
In this paper, a Federated Learning (FL) simulation platform is introduced. The target scenario is Acoustic Model training based on this platform. To our knowledge, this is the first attempt to apply
The common pipeline of training deep neural networks consists of several
building blocks such as data augmentation and network architecture selection.
AutoML is a research field that aims at automatic
Mixture of Experts (MoE) is able to scale up vision transformers effectively.
However, it requires prohibiting computation resources to train a large MoE
transformer. In this paper, we propose Residua
Spiking neural networks (SNNs) are known as a typical kind of brain-inspired
models with their unique features of rich neuronal dynamics, diverse coding
schemes and low power consumption properties. H
Graph Neural Networks (GNNs) are powerful and flexible neural networks that
use the naturally sparse connectivity information of the data. GNNs represent
this connectivity as sparse matrices, which ha
The upcoming exascale era will provide a new generation of physics
simulations. These simulations will have a high spatiotemporal resolution,
which will impact the training of machine learning models
We find Mask2Former also achieves state-of-the-art performance on video
instance segmentation without modifying the architecture, the loss or even the
training pipeline. In this report, we show univer
We investigate filter level sparsity that emerges in convolutional neural
networks (CNNs) which employ Batch Normalization and ReLU activation, and are
trained with adaptive gradient descent technique
Dynamic state representation learning is an important task in robot learning and has wide application in areas such as accelerating model free reinforcement learning, closing the simulation to reality gap, as well as reducing the motion planning complexity.
However, current dynamic state representation learning methods scale poorly on complex dynamic systems such as deformable objects, and can not directly embed well defined simulation function into the training pipeline.
In this work, we propose to progressively increase the training difficulty during learning a neural network model via a novel strategy which we call mini-batch trimming.
Mini-batch trimming makes sure that the optimizer puts its focusin the later training stages on the more difficult samples, which we identify as the ones with the highest loss in the current mini-batch.
Deep learning has made great progress in recent years, but the exploding economic and environmental costs of training neural networks are becoming unsustainable.
To address this problem, there has been a great deal of research on *algorithmically-efficient deep learning*, which seeks to reduce trainingcosts not at the hardware or implementation level, but through changes in thesemantics of the training program.
Due to the difficulty of obtaining ground-truth labels, learning from
virtual-world datasets is of great interest for real-world applications like
semantic segmentation. From domain adaptation perspec
We present a simple open-vocabulary object detection method built upon frozen vision and language models.
Surprisingly, we observe that a frozen vision model retains the locality-sensitive features necessary for detection, and 2) is a strong region classifier.
We developed a training pipeline which can train a deep learning-based object detection model with partially annotated whole slide images (wsi) and compensate class imbalances on the fly.
With this approach we can freely sample from annotated areas and are not restricted to fullyannotated extracted sub-images of the wsi as with classical approaches.
Analysis of vision-and-language models has revealed their brittleness under
linguistic phenomena such as paraphrasing, negation, textual entailment, and
word substitutions with synonyms or antonyms. W