I did a similar (in spirit, to save time) thing [1] to be able to skim on technical presentations: it creates a static HTML page with regular shots from the video with the corresponding Youtube CC on the side. It can help decide if a presentation is worth a watch (or just get the gist of one). Uses youtube-dl and ffmpeg under the hood.
To take your idea even further, one of the ideas i've been batting around with is to create some sort of learning algo which runs through a video or song, skips forward and back a set number of times to random timestamps to decide if that video or song is something I would be interested in based on my listening history.
I feel this is how I currently approach an unfamiliar artist or creator - skip ahead to random points in the video and if I still find them engaging, I rewatch from 0. This type of AI would be so great.
There's no AI in my program, it's as simple as it sounds: get regular screenshots from the video, collate the corresponding closed captions. One natural improvement of this, though, using some ML would be to focus on the slides when there is a slide+banner+floating head type of presentation. That wouldn't be terribly hard to implement in Python, in Haskell would get me ages (there's Hasktorch, but I havent' tried it, also I have many more years of Python behind me).
Your idea sounds intriguing, though. I wonder how one could measure _interest_ in this regard, some form of entropy measure might be right, but how to construct it would be the fun part.
Thanks! There's nothing here that required Haskell, it's just that I'm writing all my new "custom tools" in it. Once you get the hang of some things (parsing, running external commands, command line parsing), it's very easy to build a new tool. I imagine it would be the same with Rust (if I'm not wrong there's a very good parser combinator library as well). Ping me when you are done, my Rust is very basic but since I'd know what it does I would learn from it.
Any way I can reach you? Don't want to spam an answer with a very long list. The summary would be "a few from" (around 7-15) the last Flink Forward, Ray summit, Spark Summit (the last 2 actually) and then some other more "random" talks in areas I'm interested in. That is the bulk of it. I also have now a section of "non-glanceable", for talks where there is more than the slides (like, the Play track of last Github Universe, or the recent Bill Evans documentary shared here in HN).
Haven't checked since I wrote glancer (it's a pretty recent project), I expect the normal one to eventually fix its issues, then I'd move to it. Better not add more complications (you already need stack, that's kind of askinga lot).
Wow, thanks so much for this! Have to try this in Termux on my ereader (Onyx Books Nova 3). Love the idea of comfortably reading all these tech videos on eink instead of sitting in front of a normal display even longer.
I hope it works well. The generated HTML is _large_, since I embed the images as base64 encoded (to make it braindead easy to share the "presentation"), that's the only point I can imagine an ereader having issues, but Onyx devices should have enough power to handle that.
Thanks, when I was done I kind of thought the same. In the end, having the slides would be almost equivalent, but combining a decent enough transcript with the slides adds the minimum "ok, got it" to go from having to watch the video to just being able to skim over it.
I did something similar for myself a while back, to put the transcript in a text file. It's a five line bash script that uses youtube-dl to get the closed captions and cleans up the formatting.
#!/bin/bash
link="$1"
fn="captions"
youtube-dl --output $fn.%(ext)s --write-auto-sub --skip-download $link
sed '/-->/d' $fn.en.vtt | sed '/<c>/d' | sed '/^[[:space:]]\*$/d' | uniq > $fn.txt
When I first saw this project, I was confused as to how it wasn't any different than this feature. To me it seems like they just wrapped youtube-dl to only extract subtitles, made it a webpage, and called it a service.
Some people will google the answer to their question, land on this webpage, read the transcript for this one video that caught their interest in the first place, and be done with their task.
If you're looking to make money / provide more value with this service I think the angle you should try for is Video to Blog Post.
The transcript is fine but I'm not quite sure what problem this solves. Whereas if you were able to take the transcript and spit out a file that was broken into sections (based on YT chapters) and with an attempt to automatically clean up the grammar and remove the "um" style fill words of the spoken version I think this would hit in a very different way.
+1000 I've been trying to do this (split into sections and eliminate umms) with recordings of live classes... it takes me 4 hours per each hour of recording.
if you can make this process faster the world will throw money at your feet.. at least I will
Second this. One thing I find useful in quarantine is to extract a cook recipe from video transcripts. What I do now is opening the youtube transcripts (sometimes unavailable), pasting it into Notion and then hand-typing the sections.
Haha good one!
Yes we do rely on youtube's subtitles. But are considering adding our own speech to text feature for the videos that don't have subtitles if there is demand ;)
I built a very similar thing w/ topica.io (now defunct - I could spin it back up if there's interest), but focused on sentence-level and word-level timings to create a sleek interactive transcript[0], and transcribed the videos via third party in the background. My email's in my bio if you want to connect :)
I built something related called https://sidenote.me to take notes on YouTube videos, so I understand what you're trying to solve. Congrats on your launch, your interface looks very sleek!
My feedback about the product itself is that it's trying to do too much too soon. For example, the set interval slider, and all features in the "Pro Toolbar": are they really useful and necessary to your users? To me it seems they're not, and add confusion.
So the question you should ask yourself is: what is the one thing you want to solve for your users, and then make the interface do only this one thing and do it well. Only once you grow a user base you should add new features.
1) I was curious how their ASR software was going to be able to be fast enough for a longer video, and how the WER would be. To make it easy on them, I tried it first with a video that has no background noise. Well, bummer: all they do is scrape the closed captions -- the video I had chosen happened to have no CCs, so it didn't work at all. :-(
2) Okay, I thought, fair enough, let's try their summarization technology. So I found another video that did have closed captions. Clicked on the summary tab: "No summary generated."
So then what is this other than an automatic extractor for CCs?
This is useful. Every so often I see YouTube videos which seem appealing but are egregiously long.
I'm talking about the "Guy talks at camera" videos which often exceed 40 minutes.
Jeff Geerling or Contrapoints are huge exceptions to this because their content is engaging, well thought out and not "ranty" like the videos I'm talking about.
Maybe I have a low attention span, but at some point I'm going to be incredibly sick of this monologuing, just distill this. Please.
Have you tried it? I have the same problem as you, and this app seems to just show transcripts, which is already available on the YouTube website. This app doesn't do anything for me.
Clicking on one of Pro functions displays classic alert() dialog and loads "Upgrade" page, losing the loaded video and transcript. You might want to use a custom modal with link to "Upgrade" page which switches only if user clicks on it, and maybe even opens in new tab/page, to keep state.
The pro version additionally will summarize the transcript also according to the feature page. So in case even the transcript itself needs a tldr this has you covered. Besides I would guess that tldr is a more well known acronym than tldw.
Wow the transcription looks better than the automated one rev.com provided for a video where the speaker had a bit of an accent. Very cool - what transcription software is it using?
Nice tool. I tried on mobile.
I was able to get a transcript but I ended up in the subscription page by mistake 3 times so I had to try again each time.
It seems to me that the free version does not allow me to do enough to make me used to your tool and to end up subscribing to the pro version.
Nice, it's working on mobile Android. The transcript letters could be a bit smaller and the side margins also. An option to keep the video as sticky while scrolling the transcript could also be useful.
I was thinking about making something like this (with the same name!), but my take was going to be a bit more manual with the use case of getting the 10s answer out for a 10m clickbait video.
I have too many videos to watch, and this tool sounded like it could help. But it just shows the transcript, so this is not the TLDR. Maybe its meant to be smarter, but it said "No Summary Generated". It didn't help me, as I already skim the transcript on YouTube (its in the meatball menu next to "SAVE")
This name makes no sense. If you're reading the transcript for a Youtube video instead of, y'know, watching the video, then you can't call the transcript a TLDR. At best, it's a TLDW, though it has accessibility applications for people who also intend to watch the video.
[1]: https://github.com/rberenguel/glancer