Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExoPlayer wrongly decode some MP3 file #1376

Closed
1 task
Tolriq opened this issue May 15, 2024 · 8 comments
Closed
1 task

ExoPlayer wrongly decode some MP3 file #1376

Tolriq opened this issue May 15, 2024 · 8 comments
Assignees
Labels

Comments

@Tolriq
Copy link
Contributor

Tolriq commented May 15, 2024

Version

Media3 1.3.1

More version details

No response

Devices that reproduce the issue

Any

Devices that do not reproduce the issue

No response

Reproducible in the demo app?

Yes

Reproduction steps

Play attached file, it will show a duration of more than 1 hours and will keep playing without sound for that duration.
The file plays well in other players and the actual duration is properly reported by taglib or ffmpeg or Android.

Expected result

File have correct duration and does not play without sound for 1 hour.

Note that I had a few report of this over the years from different users.

Actual result

The file duration is completely off and it keeps playing with no sound for that duration.

Media

003.zip

Bug Report

@icbaker
Copy link
Collaborator

icbaker commented May 16, 2024

By chance I happened to test this file in the 1.2.1 version of the demo app, and there I see a duration of 7:17 and playback stops at 7:18 (this was after seeking, so the mismatch is likely due to MP3 + Xing header resulting in inherently innacurate seeking).

At 1.3.0 I see the issue you describe, the duration is shown as 1:12:51.

The culprit is this change: 4061d476a14314867da2f74ba7049c85568b56eb\

Reverting that on top of 1.3.0 restores the duration to 7:17.

The issue is that this file is variable bit rate (VBR) (as reported by mediainfo), but it has an Info frame, which generally indicates constant bitrate (CBR) (Xing frames are usually used for VBR files) - and the change I linked above started assuming that Info frames imply CBR, in order to improve seeking accuracy: #878

This results in calculating the duration of the stream by assuming all frames are the same size as the first frame after the Info frame (since that's true in a CBR stream):

// TODO: b/319235116 - Consider using the duration derived from the Xing/Info frame when
// it considers encoding delay and padding.
seeker =
getConstantBitrateSeeker(input, streamLength, /* allowSeeksIfLengthUnknown= */ false);

/**
* Peeks the next frame and returns a {@link ConstantBitrateSeeker} based on its bitrate. {@code
* streamLengthFallback} is used if {@link ExtractorInput#getLength() input.getLength()} is {@link
* C#LENGTH_UNSET}. {@code streamLengthFallback} may also be {@link C#LENGTH_UNSET} to indicate
* the length is unknown.
*/
private Seeker getConstantBitrateSeeker(
ExtractorInput input, long streamLengthFallback, boolean allowSeeksIfLengthUnknown)
throws IOException {
input.peekFully(scratch.getData(), 0, 4);
scratch.setPosition(0);
synchronizedHeader.setForHeaderData(scratch.readInt());
return new ConstantBitrateSeeker(
input.getLength() != C.LENGTH_UNSET ? input.getLength() : streamLengthFallback,
input.getPosition(),
synchronizedHeader,
allowSeeksIfLengthUnknown);
}

In this case, the file seems to have a single 104 byte, 32kbps frame immediately after the Info frame (visible with ffprobe -show_packets) - so ExoPlayer derives the duration assuming the rest of the stream is 32kbps. Every subsequent frame is then either 1044 or 1045 bytes, so it's effectively (if not technically) CBR.

Implementing the TODO related to deriving the duration from the Info header resolves the issue: We then initialize the ConstantBitrateSeeker with a bitrate and frame size derived from the Xing/Info frame frameCount, dataSize (or the InputStream.length if known), samplesPerFrame, and sampleRate values. This also ensures that when seeking the playback position ends at 7:17, indicating the seeking is more accurate than is possible from the Info header.

I will work on tidying up this change, and sending it for internal review.

@icbaker
Copy link
Collaborator

icbaker commented May 16, 2024

In this case, the file seems to have a single 104 byte, 32kbps frame immediately after the Info frame (visible with ffprobe -show_packets) - so ExoPlayer derives the duration assuming the rest of the stream is 32kbps. Every subsequent frame is then either 1044 or 1045 bytes, so it's effectively (if not technically) CBR.

Correction/clarification - all the 1044/1045 byte frames are 320kbps, so CBR - the one byte difference comes from the padding value in the frame header.

@Tolriq
Copy link
Contributor Author

Tolriq commented May 16, 2024

Thanks for the details and the probable fix.

@icbaker
Copy link
Collaborator

icbaker commented May 17, 2024

ExoPlayer complains about this file for another, related, reason: the Info frame's 'data size' field evaluates to 17463482 (0x010A78BA), but the file is 17510998 bytes long.

Even if we consider that the Xing.dataSize value is intended to measure from the start of the Xing header (0x5DB9 = 23993), it still doesn't match up:

17463482 + 23993 = 17487475 != 17510998 (difference of 23523 bytes)

ExoPlayer warns about this mismatch at 1.2.1 where the Xing frame is used for duration and seeking (after this warning, it uses the Xing frame value instead of the input stream size):

XING data size mismatch: 17510998, 17487439

Digging into the file with ffprobe and a hex viewer it looks like the MP3 'audio' data does end at 17487439 as indicated by the Info header:

$ ffprobe -show_packets 003.mp3 
# only showing last packet and eliding irrelevant values
...
[PACKET]
...
size=1045
pos=17486394
...
[/PACKET]
pos + size = 17486394 + 1045
           = 17487439 -> matches Info frame

The data after this point seems to be some album artwork, as you can see from an ASCII interpretation of a snippet of the hex dump at this point:

.cH.T..APETAGEX.....[...............
....X......Cover Art (Front).Cover A
rt (Front).jpg.......JFIF...........

The "use Info frame from CBR" logic added in 4061d47 uses the input stream length in preference to the length from the Info frame (L579 in Mp3Extractor code snippet above) - which this file suggests is a poor decision. I will change that too.

@Tolriq
Copy link
Contributor Author

Tolriq commented May 17, 2024

For the record those are APEv1 tags.
From Wikipedia:

APEv1
The APEv1 tag was designed for the [Monkey's Audio](https://en.wikipedia.org/wiki/Monkey%27s_Audio) format.[[2]](https://en.wikipedia.org/wiki/APE_tag#cite_note-2) In [MP3](https://en.wikipedia.org/wiki/MP3) files, the APE tag is stored at the very end of the file, with no inline declaration in the body of the file. The software handles the writing and access to the tag and does not interfere with the contents of the MP3.

In case this impact other places too.

@icbaker
Copy link
Collaborator

icbaker commented May 17, 2024

Ah thanks, that's useful context.

I also did a bit of digging into this one-off 104 byte (32kbps) frame at the start of the stream, and noticed it has a PCUT identifier in a hex viewer. It seems this is maybe related to https://code.google.com/archive/p/pcutmp3/ - and there's a post here from 2006 pointing out that it can result in CBR files being considered VBR: https://hydrogenaud.io/index.php/topic,32379.msg357668.html#msg357668

@Tolriq
Copy link
Contributor Author

Tolriq commented May 17, 2024

Seems I found you a nice file for the test collection :)
Thanks for taking the time on something that I suppose is not really that common.

copybara-service bot pushed a commit that referenced this issue May 21, 2024
This shows ExoPlayer currently wrongly reports the duration of this
sample, because it assumes every frame is 32kbps (104 bytes) due to the
`PCUT` frame immediately after the `Info` frame.

A follow-up change will modify `Info` frame handling to resolve this
issue.

This sample was crafted using a hex editor to insert the additional
`PCUT` frame (the pattern of `null` and `x` is taken from the sample
file in Issue: #1376, the header is modified to set the channel count
to 1 to match the rest of the file), and then update the frame count
and data size of the `Info` header to match.

Issue: #1376
PiperOrigin-RevId: 635772837
copybara-service bot pushed a commit that referenced this issue May 22, 2024
`Info` header is used for CBR files, but in some cases not **every**
frame in these files is the same size. This change stops using the
single frame after the `Info` frame as the 'template' (and assuming all
subsequent frames are the same size/bitrate), and instead derives the
bitrate from fields in the `Info` header. This works for files which are
'almost' constant bitrate, like the one in Issue: #1376 where every
frame is either 1044 or 1045 bytes except the one immediately after the
`Info` frame which is 104 bytes (32kbps), resulting in a wildly
incorrect duration calculation.

PiperOrigin-RevId: 636151605
@icbaker
Copy link
Collaborator

icbaker commented May 22, 2024

This should be resolved by the commits linked above.

I considered some other changes and ultimately rejected them:

  • Assume the bitrate of the Info frame is the same as the rest of the file (instead of the frame after it).
    • This is true for the file you provided, but not for our existing test assets, so (sample size of 2) seems not to be a reliable heuristic.
  • Only assume a Info frame represents a CBR file if vbrInfo in the frame is 1
    • This also isn't true for our existing test file, so doesn't seem reliable either.
  • Explicitly check for a PCUT frame and skip it
    • This is only needed for an Info frame that doesn't contain frame count/data size info, which I think is probably very unusual (though technically allowed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants