Mon May 15 2023
Sun May 14 2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

NLP
AI model architecture
Sequence modeling
Language modeling
Density estimation
Audio modeling

The paper proposes Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. This allows byte-level models to perform competitively with subword models on long context language modeling, achieve state-of-the-art density estimation on ImageNet, and model audio from raw files.

Implement Megabyte to improve language modeling, density estimation, and audio modeling.

HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation

Computer graphics
Parametric modeling
Biomechanics
Animation

HACK is a novel parametric model for constructing the head and cervical region of digital humans. The model seeks to disentangle the full spectrum of neck and larynx motions, facial expressions, and appearance variations. HACK provides personalized and anatomically consistent controls, particularly for the neck regions, offering more accurate and expressive controls. This approach has significant benefits for numerous applications and enables inter-correlation analysis between head and neck for fine-grained motion synthesis and transfer.

Use HACK to create high-fidelity animations with anatomically consistent controls for the head and neck regions.

ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4

Natural language processing
Multimodal learning
Vision-language understanding
Image generation
Web development

ArtGPT-4 is a multimodal model trained on image-text pairs using a Tesla A100 device in just 2 hours, using only about 200 GB of data. The model can depict images with an artistic flair and generate visual code, including aesthetically pleasing HTML/CSS web pages. The article proposes novel benchmarks for evaluating the performance of vision-language models, and ArtGPT-4 scored higher than the current model and was only slightly worse than artists on the 6-point scale.

Implement ArtGPT-4 to generate images with an artistic flair and visually pleasing web pages.

Universal Source Separation with Weakly Labelled Data

Computational auditory scene analysis
Audio analysis
Sound processing
Music source separation
Sound event separation
Speech enhancement

This paper proposes a universal audio source separation framework that uses weakly labeled audio data to separate arbitrary sound sources via a single model. The proposed system achieved significant improvements in separating a wide variety of sound classes, including sound event separation, music source separation, and speech enhancement.

Implementing this framework can significantly improve audio analysis and processing in various industries, including music, entertainment, and security.

Optimizing Memory Mapping Using Deep Reinforcement Learning

Reinforcement Learning
Resource scheduling
Memory mapping
Cloud computing
Machine learning acceleration

This paper introduces a Reinforcement Learning (RL) agent, mallocMuZero, to solve the memory mapping problem that occurs during compilation of machine learning programs. The proposed system outperformed the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads and improved the execution time of the recently published AlphaTensor matrix multiplication model.

Implementing this approach can significantly improve the resource scheduling and allocation in various industries, including cloud computing and machine learning acceleration.

Thu May 11 2023
Wed May 10 2023
Tue May 09 2023
Mon May 08 2023