Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add stem activations generate script #49

Closed
wants to merge 6 commits into from

Conversation

faroit
Copy link
Contributor

@faroit faroit commented Sep 29, 2016

This addresses #25.
This is currently work in progress. Things that are still missing:

  • apply some heuristic to tracks with bleed
  • Write test against annotations in repo
  • Add documentation
  • Fix different framing (buffer vs librosa frame generator)

@coveralls
Copy link

coveralls commented Sep 29, 2016

Coverage Status

Coverage decreased (-5.8%) to 62.34% when pulling d0de4d1 on faroit:medleydb_v1.2 into 3cef5f9 on marl:medleydb_v1.2.

@rabitt
Copy link
Contributor

rabitt commented Oct 7, 2016

@faroit Let me know if you need any support on this. I'm happy to help :)

@faroit
Copy link
Contributor Author

faroit commented Oct 7, 2016

@rabitt cool thanks

  1. it would help to add the samples dataset, because that hold me back from working on it from home ;-)
  2. I need to fix the framing, because that causes the results to be slightly different. The original matlab implementation uses buffer(wave,win_len,win_len-hop_len); This seems to be a bit more difficult to replicate with librosa.util.frame because of a weird padding scheme MATLAB uses...

@rabitt
Copy link
Contributor

rabitt commented Oct 11, 2016

@faroit the samples dataset is now downloaded in the travis environment in the latest version of the 1.2 branch, and I set the MEDLEYDB_PATH so you can access the audio. Let me know if this works for you!

@rabitt
Copy link
Contributor

rabitt commented Oct 11, 2016

I need to fix the framing, because that causes the results to be slightly different. The original matlab implementation uses buffer(wave,win_len,win_len-hop_len); This seems to be a bit more difficult to replicate with librosa.util.frame because of a weird padding scheme MATLAB uses...

Maybe @bmcfee has run into this in the past?

@faroit
Copy link
Contributor Author

faroit commented Oct 11, 2016

@faroit the samples dataset is now downloaded in the travis environment in the latest version of the 1.2 branch, and I set the MEDLEYDB_PATH so you can access the audio. Let me know if this works for you!

👍 thats so great. I will give it a try tomorrow

@faroit
Copy link
Contributor Author

faroit commented Oct 11, 2016

I need to fix the framing, because that causes the results to be slightly different. The original matlab implementation uses buffer(wave,win_len,win_len-hop_len); This seems to be a bit more difficult to replicate with librosa.util.frame because of a weird padding scheme MATLAB uses...

Maybe @bmcfee has run into this in the past?

It's actually more a problem of me being too lazy to install MATLAB here... so if you or @bmcfee have a snippet flying around that does do a 1:1 equivalent of matlabs buffer thats cool. If not... I can do it....

@bmcfee
Copy link
Contributor

bmcfee commented Oct 11, 2016

Maybe @bmcfee has run into this in the past?

Not that I recall, but I think librosa.util.frame does exactly what you want. (I don't know what padding issues you're talking about though.)

From the matlab docs:

y = buffer(x,n) partitions a length-L signal vector x into nonoverlapping data segments (frames) of length n. Each data frame occupies one column of matrix output y, which has n rows and ceil(L/n) columns. If L is not evenly divisible by n, the last column is zero-padded to length n.

In librosa, this would be:

y = librosa.util.frame(x, frame_length=n, hop_length=n)

If you want to do this with end-padding (which invokes a copy):

x_pad = librosa.util.fix_length(x, int(n * np.ceil(len(x) / n)))
y = librosa.util.frame(x_pad, frame_length=n, hop_length=n)

If you want overlap, then change the hop_length accordingly.

@coveralls
Copy link

coveralls commented Oct 12, 2016

Coverage Status

Coverage decreased (-5.7%) to 61.699% when pulling 61ba9db on faroit:medleydb_v1.2 into 3eb3890 on marl:medleydb_v1.2.

@faroit
Copy link
Contributor Author

faroit commented Oct 12, 2016

@bmcfee thanks, this seems to work

@coveralls
Copy link

coveralls commented Oct 12, 2016

Coverage Status

Coverage decreased (-6.02%) to 61.404% when pulling 2f5ef02 on faroit:medleydb_v1.2 into 3eb3890 on marl:medleydb_v1.2.

@faroit
Copy link
Contributor Author

faroit commented Oct 12, 2016

@rabitt next up, the values are a bit different, I will install matlab tomorrow and see if I can reproduce the annotations in matlab and eventually step trough the code.

See the first 10 lines from the csv:

Reference annotation

time,S01,S02,S03,S04,S05
0.0000,0.0475,0.0474,0.0474,0.0481,0.0474
0.0464,0.0489,0.0480,0.0481,0.3278,0.0481
0.0929,0.0502,0.0486,0.0488,0.8131,0.0488
0.1393,0.0515,0.0492,0.0494,0.9718,0.0494
0.1858,0.0525,0.0497,0.0500,0.9957,0.0500
0.2322,0.0534,0.0501,0.0505,0.9992,0.0505
0.2786,0.0539,0.0505,0.0510,0.9998,0.0509
0.3251,0.0543,0.0509,0.0514,0.9999,0.0512
0.3715,0.0545,0.0511,0.0516,1.0000,0.0515

created with 2f5ef02

time,S01,S02,S03,S04,S05
0.0000,0.0475,0.0474,0.0474,0.0442,0.0474
0.0464,0.0494,0.0488,0.0489,0.3436,0.0489
0.0929,0.0514,0.0502,0.0504,0.8447,0.0504
0.1393,0.0532,0.0515,0.0519,0.9799,0.0518
0.1858,0.0548,0.0527,0.0532,0.9972,0.0531
0.2322,0.0561,0.0538,0.0543,0.9995,0.0542
0.2786,0.0572,0.0547,0.0553,0.9999,0.0552
0.3251,0.0580,0.0555,0.0561,1.0000,0.0560
0.3715,0.0585,0.0561,0.0567,1.0000,0.0566

it's not that far off, but the idea is to test against the reference (not creating new annotations with python), right?

@rabitt
Copy link
Contributor

rabitt commented Oct 13, 2016

it's not that far off, but the idea is to test against the reference (not creating new annotations with python), right?

Ideally the output should match the annotations. Let me know what you find after checking against matlab.

H = []

for track_id, track in mtrack.stems.items():
audio, rate = librosa.load(track.file_path, mono=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default librosa loads the audio at sr=22050, but I think originally the activations were computed using the original samplerate, sr=44100.

Why mono=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the time vectors were indicating otherwise, but I now found that it's actually the window length parameter that need to be changed. So sr=441000 and win_length=4096 seems to match.

Results are still a bit off. I am working on it...

time,S01,S02,S03,S04,S05
0.0000,0.9927,0.6062,0.1231,0.5787,0.8933
0.0464,0.9966,0.9341,0.3208,0.8458,0.9880
0.0929,0.9983,0.9919,0.6236,0.9549,0.9987
0.1393,0.9991,0.9989,0.8572,0.9870,0.9998
0.1858,0.9994,0.9998,0.9567,0.9959,1.0000
0.2322,0.9996,1.0000,0.9877,0.9985,1.0000
0.2786,0.9997,1.0000,0.9964,0.9994,1.0000
0.3251,0.9997,1.0000,0.9988,0.9997,1.0000
0.3715,0.9998,1.0000,0.9996,0.9998,1.0000

vs

time,S01,S02,S03,S04,S05
0.0000,0.9932,0.5501,0.0728,0.5285,0.8885
0.0464,0.9967,0.9246,0.2522,0.8291,0.9870
0.0929,0.9983,0.9913,0.5889,0.9523,0.9985
0.1393,0.9990,0.9989,0.8569,0.9868,0.9998
0.1858,0.9993,0.9998,0.9607,0.9960,1.0000
0.2322,0.9995,1.0000,0.9896,0.9986,1.0000
0.2786,0.9996,1.0000,0.9971,0.9994,1.0000
0.3251,0.9997,1.0000,0.9991,0.9997,1.0000
0.3715,0.9997,1.0000,0.9997,0.9999,1.0000

seems that the framing might still be an issue...

# MATLAB equivalent to @hanning(win_len)
win = scipy.signal.windows.hann(win_len + 2)[1:-1]

# mix down to 1 channel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do this on load?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

@@ -188,8 +192,8 @@ def test_add_sequence_to_melody4(self):
print(expected)
self.assertTrue(array_almost_equal(actual, expected))

def test_add_sequence_to_melody4(self):

def test_add_sequence_to_melody5(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@rabitt
Copy link
Contributor

rabitt commented Oct 13, 2016

re. the first item:

  • apply some heuristic to tracks with bleed

for now, throw a warning for tracks with bleed and just turn up the confidence needed to be considered "active". We're in the process of plugging in an algorithm by @TGabor that performs bleed removal, so we'll be able to use the same activation code for everything.

@coveralls
Copy link

coveralls commented Oct 13, 2016

Coverage Status

Coverage decreased (-6.02%) to 61.404% when pulling 4a21345 on faroit:medleydb_v1.2 into 3eb3890 on marl:medleydb_v1.2.

@rabitt
Copy link
Contributor

rabitt commented Oct 16, 2016

fyi, I (finally) merged the v1.2 branch to master, so this PR can eventually be pulled into master.

@rabitt
Copy link
Contributor

rabitt commented Nov 8, 2016

Hey @faroit ! Wanted to check in on the status of this.

@rabitt
Copy link
Contributor

rabitt commented Nov 8, 2016

fyi @pli1988 happens to be working on code to convert from activation confidence values to the sourceid annotations.

@faroit
Copy link
Contributor Author

faroit commented Nov 17, 2016

@rabitt sorry, quite busy over here. I will continue working on the PR over the course of the weekend.

# binary thresholding for low overall energy events
mask = np.ones(H.shape)
mask[:, E0 < 0.01] = 0
H = H * mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could replace 36-38 with this one-liner:
H[:, E0 < 0.01] = 0.0

@faroit
Copy link
Contributor Author

faroit commented Jan 15, 2017

@rabitt sorry for not finishing this up. do you want to take over? Can you edit this PR or do I better close this?

@rabitt
Copy link
Contributor

rabitt commented Jan 16, 2017

@faroit yes, I'll take over (see PR #63 ). I would have pushed on top of this branch but I didn't have write access to your fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants