Question about text condition embedding shape of musicgen-melody in training phase #482

Lonian6 · 2024-07-29T13:43:19Z

Hello, I have a question about training the musicgen-melody model.
The text condition is control the result by concatenate the text embedding before the input sequence.
I tried to print the return from the ConditionFuser, the model.lm.fuser below. The Length of the "input_" tensor with shape (Batch, Length, 1536) is seem to be changeable depending on the max length of the text embedding of a batch.

Is the Length changeable when training? If not, what is the prefix length of text embedding when training?

import torch
import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
from audiocraft.modules.conditioners import (
    ConditionFuser,
    ClassifierFreeGuidanceDropout,
    AttributeDropout,
    ConditioningProvider,
    ConditioningAttributes,
    ConditionType,
)

model = MusicGen.get_pretrained('facebook/musicgen-melody')
model.set_generation_params(duration=8)

input_text = ['text_1', 'text_2', 'text_3']

attributes, prompt_tokens = model._prepare_tokens_and_attributes(input_text, None)
conditions = attributes
# prepare unconditional generation for cfg
null_conditions = ClassifierFreeGuidanceDropout(p=1.0)(conditions)
if conditions:
    conditions = conditions + null_conditions
    tokenized = model.lm.condition_provider.tokenize(conditions)
    cfg_conditions = model.lm.condition_provider(tokenized)

prompt = torch.zeros((10, 4, 0), dtype=torch.long, device=model.device)
prompt = torch.cat([prompt, prompt], dim=0)

input_ = sum([model.lm.emb[k](prompt[:, k]) for k in range(4)])
input_, cross_attention_input = model.lm.fuser(input_, cfg_conditions)
print(input_.shape)

The text was updated successfully, but these errors were encountered:

Lonian6 changed the title ~~Question about text condition embedding shape of musicgen-melody in training phase?~~ Question about text condition embedding shape of musicgen-melody in training phase Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about text condition embedding shape of musicgen-melody in training phase #482

Question about text condition embedding shape of musicgen-melody in training phase #482

Lonian6 commented Jul 29, 2024

Question about text condition embedding shape of musicgen-melody in training phase #482

Question about text condition embedding shape of musicgen-melody in training phase #482

Comments

Lonian6 commented Jul 29, 2024