Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSA/ASS subtitles - Overlapping start/end times and position tag is not handled #6320

Closed
XBigTK13X opened this issue Aug 19, 2019 · 14 comments
Closed
Assignees
Labels

Comments

@XBigTK13X
Copy link

XBigTK13X commented Aug 19, 2019

Use case description

When playing videos files with SSA/ASS subtitle tracks, styled subtitles are mashed together with conversation. For example, many animated Japanese television shows have subtitles that float on screen to translate things that aren't part of speech. In ExoPlayer, this results in the subtitles flipping back and forth between conversation and the translated symbols. This leads to neither being on screen long enough to be read.

Proposed solution

Integrate libass as a supported subtitle renderer

Alternatives considered

The only alternative would be rolling a new renderer from scratch for ExoPlayer. All major media players are currently using this library. I did look through other issues on this project about SSA/ASS subtitles, but none of them lead to a request for integrating this library as a solution.

@tonihei
Copy link
Collaborator

tonihei commented Aug 20, 2019

I had a quick look at the library and from what I see it produces overlay images including the subtitles. Is that correct? I'm afraid that's not that easy to support in ExoPlayer, because you'd need some form of special renderer that draws the overlay. See #5860 for details around writing such a renderer. It would also require an extra surface and at least some extra layer in the UI module to actually draw the overlay.

Depending on how it looks like, we could probably accept pull requests for an extension that integrates this library and provides the necessary integration layers. But we should definitely keep the basic text-only subtitle extraction for SSA/ASS we currently have. The problem you are describing sounds more like a bug in our implementation and it would be great if you can provide example media and some guidance on the expected result, so that we can fix our support for the text-only version.

@XBigTK13X
Copy link
Author

Thank you for the feedback. That points me in the right direction. I will try adding an alternative SSA renderer locally to the ExoPlayer source to see if that solves the problem for my project.

Here is a snippet from a subtitle file that renders incorrectly in the current ExoPlayer.

[Script Info]
; Script generated by Aegisub 3.2.2
; http://www.aegisub.org/
Title: ExportedSubs
ScriptType: v4.00+
WrapStyle: 0
PlayResX: 1280
PlayResY: 720
ScaledBorderAndShadow: yes
YCbCr Matrix: TV.709

[Aegisub Project Garbage]
Last Style Storage: Aikatsu 1
Audio File: yuru05_premux.mkv
Video File: yuru05_premux.mkv
Keyframes File: yuru05_premux_keyframes.log
Video AR Mode: 4
Video AR Value: 1.777778
Video Zoom Percent: 0.875000
Scroll Position: 1422
Active Line: 1437
Video Position: 33970

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,Cabin,51,&H00FAFAFA,&H000019FF,&H003F1613,&HC42F100E,-1,0,0,0,100,100,0,0,1,2.4,1.1,2,150,150,40,1
Style: Default it,Cabin,51,&H00FAFAFA,&H000019FF,&H003F1613,&HC42F100E,-1,-1,0,0,100,100,0,0,1,2.4,1.1,2,150,150,40,1
Style: Default-alt,Cabin,51,&H00FAFAFA,&H00002EFF,&H00201843,&HC4171232,-1,0,0,0,100,100,0,0,1,2.4,1.1,2,150,150,40,1
Style: Signs,Lato,54,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,0,0,5,0,0,0,1
Style: OP,FOT Kafu Nagomi Std Stripped,40,&H00FFFFFF,&H000000FF,&H00BA5B2F,&H00000000,0,0,0,0,100,100,0,0,1,1,0,2,10,10,25,1
Style: OP jp,FOT Kafu Nagomi Std Stripped,40,&H00FFFFFF,&H000000FF,&H00BA5B2F,&H00000000,0,0,0,0,100,100,0,0,1,1,0,8,10,10,25,1
Style: ED,Ruzicka TypeK,40,&H00FFFFFF,&H000000FF,&H00717171,&H00000000,-1,0,0,0,100,100,0,0,1,2.1,0,7,40,10,25,1
Style: ED jp,Ruzicka TypeK,40,&H00FFFFFF,&H000000FF,&H00717171,&H00000000,-1,0,0,0,100,100,0,0,1,2.1,0,7,40,10,71,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 10,0:09:18.07,0:09:21.13,Default it,,0,0,0,,I'll start with something simple.
Dialogue: 0,0:09:21.13,0:09:30.39,Signs,,0,0,0,,{\clip()\b1\fs14.5\move(899,300,899,339,28,9245)\frz4.3\fax0.175\c&H6054FF&\blur0.6}Mega Tasty Milk
Dialogue: 10,0:09:21.71,0:09:26.30,Default it,,0,0,0,,Tonight's meal is a pasta soup that can be made with a single camping pot.
Dialogue: 10,0:09:26.30,0:09:30.39,Default it,,0,0,0,,It leaves no extra broth, making it ideal for camping and hiking.
Dialogue: 0,0:09:30.39,0:09:30.89,Signs,,0,0,0,,{\pos(191.43,146.14)\fnBurst My Bubble\b1\frz29.42\fs40\c&H040303&\blur0.65}Onions
Dialogue: 0,0:09:30.39,0:09:30.89,Signs,,0,0,0,,{\pos(494,58)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz345.8}Bacon
Dialogue: 0,0:09:30.39,0:09:30.89,Signs,,0,0,0,,{\pos(818,62)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz343.8}Asparagus
Dialogue: 0,0:09:30.39,0:09:30.89,Signs,,0,0,0,,{\pos(1020,112)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz342}Mushrooms
Dialogue: 0,0:09:30.89,0:09:31.39,Signs,,0,0,0,,{\pos(225.716,131.283)\fnBurst My Bubble\b1\frz21\fs40\c&H040303&\blur0.65}Onions
Dialogue: 0,0:09:30.89,0:09:31.39,Signs,,0,0,0,,{\pos(479,60.33)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz357}Bacon
Dialogue: 0,0:09:30.89,0:09:31.39,Signs,,0,0,0,,{\pos(792,58)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz354}Asparagus
Dialogue: 0,0:09:30.89,0:09:31.39,Signs,,0,0,0,,{\pos(1039,121)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz332.5}Mushrooms
Dialogue: 10,0:09:31.05,0:09:33.90,Default it,,0,0,0,,Take your previously chopped ingredients,
Dialogue: 0,0:09:31.39,0:09:31.89,Signs,,0,0,0,,{\pos(191.43,146.14)\fnBurst My Bubble\b1\frz29.42\fs40\c&H040303&\blur0.65}Onions
Dialogue: 0,0:09:31.39,0:09:31.89,Signs,,0,0,0,,{\pos(494,58)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz345.8}Bacon
Dialogue: 0,0:09:31.39,0:09:31.89,Signs,,0,0,0,,{\pos(818,62)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz343.8}Asparagus
Dialogue: 0,0:09:31.39,0:09:31.89,Signs,,0,0,0,,{\pos(1020,112)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz342}Mushrooms
Dialogue: 0,0:09:31.89,0:09:32.39,Signs,,0,0,0,,{\pos(225.716,131.283)\fnBurst My Bubble\b1\frz21\fs40\c&H040303&\blur0.65}Onions
Dialogue: 0,0:09:31.89,0:09:32.39,Signs,,0,0,0,,{\pos(479,60.33)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz357}Bacon
Dialogue: 0,0:09:31.89,0:09:32.39,Signs,,0,0,0,,{\pos(792,58)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz354}Asparagus
Dialogue: 0,0:09:31.89,0:09:32.39,Signs,,0,0,0,,{\pos(1039,121)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz332.5}Mushrooms
Dialogue: 0,0:09:32.39,0:09:32.89,Signs,,0,0,0,,{\pos(191.43,146.14)\fnBurst My Bubble\b1\frz29.42\fs40\c&H040303&\blur0.65}Onions
Dialogue: 0,0:09:32.39,0:09:32.89,Signs,,0,0,0,,{\pos(494,58)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz345.8}Bacon
Dialogue: 0,0:09:32.39,0:09:32.89,Signs,,0,0,0,,{\pos(818,62)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz343.8}Asparagus
Dialogue: 0,0:09:32.39,0:09:32.89,Signs,,0,0,0,,{\pos(1020,112)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz342}Mushrooms
Dialogue: 0,0:09:32.89,0:09:33.39,Signs,,0,0,0,,{\pos(225.716,131.283)\fnBurst My Bubble\b1\frz21\fs40\c&H040303&\blur0.65}Onions
Dialogue: 0,0:09:32.89,0:09:33.39,Signs,,0,0,0,,{\pos(479,60.33)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz357}Bacon
Dialogue: 0,0:09:32.89,0:09:33.39,Signs,,0,0,0,,{\pos(792,58)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz354}Asparagus
Dialogue: 0,0:09:32.89,0:09:33.39,Signs,,0,0,0,,{\pos(1039,121)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz332.5}Mushrooms
Dialogue: 0,0:09:33.39,0:09:33.90,Signs,,0,0,0,,{\pos(191.43,146.14)\fnBurst My Bubble\b1\frz29.42\fs40\c&H040303&\blur0.65}Onions
Dialogue: 0,0:09:33.39,0:09:33.90,Signs,,0,0,0,,{\pos(494,58)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz345.8}Bacon
Dialogue: 0,0:09:33.39,0:09:33.90,Signs,,0,0,0,,{\pos(818,62)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz343.8}Asparagus
Dialogue: 0,0:09:33.39,0:09:33.90,Signs,,0,0,0,,{\pos(1020,112)\fnBurst My Bubble\b1\fs40\c&H040303&\blur0.65\frz342}Mushrooms
Dialogue: 10,0:09:33.90,0:09:37.49,Default it,,0,0,0,,and fry them in olive oil and chopped garlic.
Dialogue: 10,0:09:37.90,0:09:41.28,Default it,,0,0,0,,Mix in one hundred and fifty milliliters of water,
Dialogue: 10,0:09:41.65,0:09:45.41,Default it,,0,0,0,,add one cube of consomme, and bring to a boil.

When drawn through a player using libass, it looks like this.
correct-subtitle-render

Here is a clip showing the undesirable rendering behavior.

ssa-exoplayer-behavior

Now that I look at the subtitle data, I wonder if a second rendering track at the top for Signs would be sufficient to fix the dialogue getting replaced.

@tonihei
Copy link
Collaborator

tonihei commented Aug 23, 2019

The particular reasons why this goes wrong here are:

  1. We don't expect overlapping subtitle times at the moment. We should detect that and and return all active subtitles for a given time so that they can all be displayed simultaneously.
  2. We don't parse the position tag of the subtitles to place it at the right position.

Step 1 is more important to remove the strange flickering of subtitles. Step 2 is nice to have because it would actually render the text at the right position.

@tonihei tonihei changed the title Integrate libass to support subtitle styling. SSA/ASS subtitles - Overlapping start/end times and position tag is not handled Aug 23, 2019
@ojw28
Copy link
Contributor

ojw28 commented Aug 27, 2019

Note: I think (1) is a duplicate of #4725. Should we merge that issue and this one (carrying over any necessary information into whichever issue is kept open)?

@XBigTK13X

This comment has been minimized.

@tonihei
Copy link
Collaborator

tonihei commented Sep 2, 2019

Let's keep this issue as it is newer and has more details.

Another example ass file content is in #4725 (comment).

@szaboa
Copy link
Contributor

szaboa commented Oct 27, 2019

I'd like to contribute on this if it's ok.

I've already parsed and applied the position attributes, commit:
szaboa@5791677

Regarding the overlapping issue, I am not sure what change is needed, a hint would be great. I was thinking about changing com.google.android.exoplayer2.text.ssa.SsaSubtitle#getCues or it is too late at that point? I'll make a pull request once I figure this out :)

@ojw28 ojw28 assigned icbaker and unassigned tonihei Oct 28, 2019
@ojw28
Copy link
Contributor

ojw28 commented Oct 28, 2019

Reassigning to @icbaker for subtitles ownership. @szaboa - I think we could probably accept a pull request for all (or part) of this, depending on timeline.

@tonihei
Copy link
Collaborator

tonihei commented Oct 28, 2019

Regarding the overlapping issue, I am not sure what change is needed, a hint would be great. I was thinking about changing com.google.android.exoplayer2.text.ssa.SsaSubtitle#getCues or it is too late at that point? I'll make a pull request once I figure this out :)

Subtitle.getCues can return multiple cues at the same time. So it should be possible to update the subtitle whenever the content or the number of cues changes. That means a single cues may occur in multiple subtitles if another cue is added or removed.

@szaboa
Copy link
Contributor

szaboa commented Oct 29, 2019

@tonihei

Let's say we have overlapping subtitles like this:
00:00:04 - 00:00:07 - "First subtitle"
00:00:05 - 00:00:08 - "Second subtitle"

cueTimes will be [4000000, 7000000, 5000000, 8000000]
cues will be ["First subtitle", EMPTY, "Second subtitle", EMPTY]

The first problem is that currently the TextRenderer after taking the first cue, it will wait until 7000000, because that will be returned from com.google.android.exoplayer2.text.TextRenderer#getNextEventTime.

I assume the decoder needs to be changed to build overlapping parts as different cueTimes? Or do you think this can be solved without touching the TextRenderer nor the SsaDecoder?

@tonihei
Copy link
Collaborator

tonihei commented Oct 29, 2019

The cues for your example should be:
cueTimes: [4000000, 50000000, 7000000, 8000000]
cues: [List("First subtitle"), List("First subtitle", "Second subtitle"), List("Second subtitle"), List()]

If you send the data in this format, the TextRender should be happy I think.

@szaboa
Copy link
Contributor

szaboa commented Oct 29, 2019

Following the approach you described it seems to be working fine. I've created a pull request to make it easier to show the diffs.

What do you think about the proposed solution? It can be optimized?
I plan to write a couple of unit tests once we get a final solution.

@tonihei
Copy link
Collaborator

tonihei commented Oct 30, 2019

Thanks for updating the pull request! @icbaker will have a look at the details I think.

@icbaker
Copy link
Collaborator

icbaker commented Dec 5, 2019

PR is merged

@icbaker icbaker closed this as completed Dec 5, 2019
@google google locked and limited conversation to collaborators Feb 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants