In this paper, we propose a novel backbone network, namely cbnetv2, by constructing compositions of existing open-sourced pre-trained backbones.
Specifically, cbnetv2 integrates the high-and low-level features of multiple backbone networks and gradually expands the receptive field to more efficiently perform object detection.
Developing a robust algorithm to diagnose and quantify the severity of
COVID-19 using Chest X-ray (CXR) requires a large number of well-curated
COVID-19 datasets, which is difficult to collect under t
Network pruning has been the driving force for the efficient inference of neural networks and the alleviation of model storage and transmission burden. Traditional network pruning methods focus on the
This work is for designing one-stage lightweight detectors which perform wellin terms of map and latency.
Benchmark datasets and proposed detectors are analyzed in terms of the number of parameters, gflops, gpu latency, cpu latency and map,on ms coco dataset which is a benchmark dataset in object detection.
Various applications of voice synthesis have been developed independently
despite the fact that they generate "voice" as output in common. In addition,
most of the voice synthesis models still require
Deep learning has shown a tremendous growth in hashing techniques for image
retrieval. Recently, Transformer has emerged as a new architecture by utilizing
self-attention without convolution. Transfor
In response to the existing object detection algorithms are applied to
complex fire scenarios with poor detection accuracy, slow speed and difficult
deployment., this paper proposes a lightweight fire
Fine-tuning large pre-trained models on downstream tasks has been adopted in
a variety of domains recently. However, it is costly to update the entire
parameter set of large pre-trained models. Althou
Recently, a massive number of deep learning based approaches have been
successfully applied to various remote sensing image (RSI) recognition tasks.
However, most existing advances of deep learning me
While deep learning has achieved phenomenal successes in many AI
applications, its enormous model size and intensive computation requirements
pose a formidable challenge to the deployment in resource-
Convolutional neural networks (cnns) have beenproven to be a powerful feature extractor in hyperspectral (hs) image classification, but fail to mine and represent the sequence attributes of spectral signatures well due to the limitations of their inherent network backbone.
To solve this issue, we rethink the classification problem from a sequential perspective with transformers, and propose a novel backbone network called \ul{spectralformer}.
To further augment the generalization capability of deep learning model to various vendors with limited resources, a new contrastive learning scheme is developed.
Specifically, the backbone network is firstly trained with a multi-style and multi-view unsupervised self-learning scheme for the embedding of invariant features to various vendor-styles.
In response to the situation that the conventional bridge crack manual
detection method has a large amount of human and material resources wasted,
this study is aimed to propose a light-weighted, high
Under the global COVID-19 crisis, developing robust diagnosis algorithm for
COVID-19 using CXR is hampered by the lack of the well-curated COVID-19 data
set, although CXR data with other disease are a
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone
network for object detection. This design enables the original ViT architecture
to be fine-tuned for object detection with
This paper proposes an adaptive auxiliary task learning based approach for
object counting problems. Unlike existing auxiliary task learning based
methods, we develop an attention-enhanced adaptively
Deep learning has achieved excellent performance in various computer vision
tasks, but requires a lot of training examples with clean labels. It is easy to
collect a dataset with noisy labels, but suc
We address representation learning for large-scale instance-level image
retrieval. Apart from backbone, training pipelines and loss functions, popular
approaches have focused on different spatial pool
We develop a novel learning scheme named Self-Prediction for 3D instance and
semantic segmentation of point clouds. Distinct from most existing methods that
focus on designing convolutional operators,
We advance a novel backbone network, xmorpher, for the effective corresponding feature representation in deformable medical image registration (dmir).
It proposes a novel full transformer architecture including dual parallel feature extraction networks which exchange information through cross attention, thus discovering multi-level semantic correspondence while extracting respective features gradually for final effective registration.