Author Topic: SGU Transcripts Project  (Read 1773 times)

0 Members and 1 Guest are viewing this topic.

Offline rwh

  • Off to a Start
  • *
  • Posts: 31
  • SGU Transcriber
    • roblog
SGU Transcripts Project
« on: Apr 12, 2012, 06:00:46 AM »
Hi All,

I’ve been thinking for a while that it’d be really good to have transcripts of SGU episodes to facilitate linking, searching and accessibility. But I couldn’t find any other project that is undertaking this, other than this old thread:

http://sguforums.com/index.php/topic,12548.0.html

So I decided to put my time where my mouth is, and just do some transcribing.  It takes me about three times the amount of time to do a transcript as the podcast’s length, so generally a couple of nights’ work for a single SGU podcast. I guess I might be able to manage about two podcasts a week, and given that there are currently a bit over 350 podcasts in the archive, and a new one released each week, it’ll probably take me about seven years to finish them all! So… I created a wiki:

http://www.sgutranscripts.org

I'd really appreciate it if anyone wanted to pitch in and do some transcribing.  Otherwise I'll probably be at it for a long time!  It's simple to do, I just use VLC media player, dial down the speed, and type out what I'm listening to in a text editor. A quick alt-tab, then shift-left a couple of times skips back a few seconds if I miss something. Then I spell check and add a bit of wiki markup and paste it into the wiki where it's public for all to use.  :)

Anyway, I'm really just getting started, but I hope this can be useful to the community.

Cheers,

Rob
SGUTranscripts - Transcripts of the Skeptics' Guide.

Offline WC

  • Frequent Poster
  • ******
  • Posts: 2302
  • inflammable means flammable?
Re: SGU Transcripts Project
« Reply #1 on: Apr 14, 2012, 03:46:33 PM »
My first thought is taking advantage of something like Google Voice for transcription, or some commercial dictation software to do the lion's share of the work... But then my next thought is how time consuming it would be to go in and edit it up to properly put in who is speaking what.

Offline seaotter

  • Drunkenly yelling LITTLE WING!
  • Planetary Skeptic
  • *
  • Posts: 28735
  • My homunculus is an atheist.
Re: SGU Transcripts Project
« Reply #2 on: Apr 14, 2012, 03:48:42 PM »
I'll volunteer for an episode if we want to do it the hard way.
"There is no use trying," said Alice; "one can't believe impossible things." Lewis Carroll

Offline WC

  • Frequent Poster
  • ******
  • Posts: 2302
  • inflammable means flammable?
Re: SGU Transcripts Project
« Reply #3 on: Apr 14, 2012, 04:38:09 PM »
Yeah, I can totally knock off an episode or two after the end of the month. Plenty of free time come May, probably, unless my academic advisor and program director decide I have to do more, but I don't think they will. I needs me some freaking down time.

Offline rwh

  • Off to a Start
  • *
  • Posts: 31
  • SGU Transcriber
    • roblog
Re: SGU Transcripts Project
« Reply #4 on: Apr 15, 2012, 03:23:22 PM »
Yeah, when I was thinking about starting up this project I did some research on that, and apparently even the best tools (like Dragon Naturally Speaking) don't do well when they can't be trained, and when there are multiple different voices speaking.  Accuracy rates are generally supposed to be down around 80%.  The open source tools that I found weren't really anywhere near being able to be used by regular users.  And yeah, add to that the editing time required and it doesn't really seem worth it.

Though I guess really all we've got here is a wiki, any transcription technique can be used, and I'd be interested to hear if anyone is able to do better with voice recognition.  Certainly the Google Voice thing is an interesting idea, though I've not heard that there's actually a public API for that yet.

Thanks for volunteering to do some transcripts though, the more the merrier! :)
SGUTranscripts - Transcripts of the Skeptics' Guide.

Offline seaotter

  • Drunkenly yelling LITTLE WING!
  • Planetary Skeptic
  • *
  • Posts: 28735
  • My homunculus is an atheist.
Re: SGU Transcripts Project
« Reply #5 on: Apr 15, 2012, 04:01:50 PM »
Code: [Select]
Which one are you doing now? And which have been done?
"There is no use trying," said Alice; "one can't believe impossible things." Lewis Carroll

Offline rwh

  • Off to a Start
  • *
  • Posts: 31
  • SGU Transcriber
    • roblog
Re: SGU Transcripts Project
« Reply #6 on: Apr 15, 2012, 04:08:32 PM »
I just finished #352, and am about to go to bed so I'm not doing any right now.  Teleuteskitty is doing 348.  If you look at the front page of sgutranscripts.org, you'll see the ones that are done or in progress (right now episodes 1 and 350-352 with 348 in progress).  If you put in a holding page for the one you're doing with a note saying that you're doing it, we can avoid duplication.

-Rob
SGUTranscripts - Transcripts of the Skeptics' Guide.

Offline WC

  • Frequent Poster
  • ******
  • Posts: 2302
  • inflammable means flammable?
Re: SGU Transcripts Project
« Reply #7 on: Apr 15, 2012, 09:11:05 PM »
Yeah, when I was thinking about starting up this project I did some research on that, and apparently even the best tools (like Dragon Naturally Speaking) don't do well when they can't be trained, and when there are multiple different voices speaking.  Accuracy rates are generally supposed to be down around 80%.  The open source tools that I found weren't really anywhere near being able to be used by regular users.  And yeah, add to that the editing time required and it doesn't really seem worth it.

Though I guess really all we've got here is a wiki, any transcription technique can be used, and I'd be interested to hear if anyone is able to do better with voice recognition.  Certainly the Google Voice thing is an interesting idea, though I've not heard that there's actually a public API for that yet.

Thanks for volunteering to do some transcripts though, the more the merrier! :)
Figured you'd have thunk it all. As much as I love GV, I don't think it's up to transcription. Well, won't know till I plug an episode into it. All in all, slowing it down on VLC and doing manually is probably the only way to do it.

Even though I'm around Vegas during TAMs, I can't afford it. Poor student after all, can't even really contribute financially to the show or buy swag. What I do have is a typing speed of 85 WPM, and some free time every 4 or so months. I've been such a huge fan for so long, I've practically thrown myself at Jay as a web and Flash developer, just to pitch in what I can. I can do this, and gladly too. Also, total excuse to go back through the episodes :D

Offline Jay_One

  • Seasoned Contributor
  • ****
  • Posts: 515
Re: SGU Transcripts Project
« Reply #8 on: Apr 16, 2012, 09:57:17 AM »
This seems like something I could get behind. I'll look into transcribing software when I get home in the hopes there's an easier way to start a new speaker's line rather than manually typing R:, S:, etc. every time.
"I intend to live forever, or die trying." - Groucho Marx

Offline rwh

  • Off to a Start
  • *
  • Posts: 31
  • SGU Transcriber
    • roblog
Re: SGU Transcripts Project
« Reply #9 on: Apr 17, 2012, 10:21:53 AM »
Sounds good.  There was some software mentioned in the previous thread from 2008 but I haven't really looked into it.

http://sguforums.com/index.php/topic,12548.0.html
SGUTranscripts - Transcripts of the Skeptics' Guide.

Offline Neko-chan

  • Brand New
  • Posts: 9
Re: SGU Transcripts Project
« Reply #10 on: Apr 17, 2012, 12:19:02 PM »
I'm not an expert on speech recognition software, but from what I've seen, it looks like this may take just as long, what with multiple voices, frequent overlap of speech and the speed at which they talk. Correcting for these may be just as time consuming, but I'd be interested to hear if anything turns up.

I'm not a very fast typist, but I'll soldier on with the hard way for now, until a better solution is found.

I've also been wondering about including other features with the transcripts to make them as useful as possible. Like time stamps for each section, and mb a list of facts from the episode (a kind of TIL list), as they're always dropping in interesting tid-bits. Things that are easy for transcribers to include, but may increase the utility of the pages for people who don't intend to read the whole transcript.
I figured it would be best to address this early on so we don't have to go back through transcriptions later.
thanks, guys!
(Teleuteskitty)
« Last Edit: Apr 17, 2012, 12:26:52 PM by Neko-chan »

Offline Jay_One

  • Seasoned Contributor
  • ****
  • Posts: 515
Re: SGU Transcripts Project
« Reply #11 on: Apr 17, 2012, 02:51:13 PM »
I have myself a new hobby. I used TranscriberAG for my last one, but it's awfully fiddly to get used to. It lets you easily cut up the episodes into various segments, and those segments into sections. There are incredibly useful shortcuts and hotkeys for pausing, rewinding, and changing speaker, but my only grievance is that the export format is different to the standard format on the website. ie, whereas we put S:, the software uses ***S***, and also has timestamps at the start of each segment which I have to manually edit out.

Edit: Also, is there a page template people can use? There will always be Intro, This Day in Skepticism, News Items, Interview, Who's that Noisy, Science or Fiction and Skeptical Quote of the Week. It's easy enough to add titles for Emails and Updates, but it would be useful to have the rest of it already in place when starting a new transcription.
« Last Edit: Apr 17, 2012, 03:03:05 PM by Jay_One »
"I intend to live forever, or die trying." - Groucho Marx

Offline Neko-chan

  • Brand New
  • Posts: 9
Re: SGU Transcripts Project
« Reply #12 on: Apr 18, 2012, 11:21:55 AM »
Edit: Also, is there a page template people can use? ...
I've put a link to a draft template I was using in the community portal

Also, I've just started playing around with ExpressScribe and it's speeding the process up no end.

You can assign system-wide hotkeys to play, pause, step fwd/back (adjustable), speed up/down (adjustable), etc. so you can control it whilst staying in your text editor.

Plus, the speed changes retain the original pitch - I don't know if this is the case with VLC, I've got an old version.
and it's free.

Offline Jay_One

  • Seasoned Contributor
  • ****
  • Posts: 515
Re: SGU Transcripts Project
« Reply #13 on: Apr 18, 2012, 12:10:39 PM »
ExpressScribe is great. The system wide hotkeys mean I can use my preferred text editor.

What does everyone think about adding hyperlinks to the transcriptions? I've noticed in a few transcriptions there are links for certain people or items mentioned. It's been brought up on the community talk page that there are external wikipedia links and internal wiki links, though I'm not sure which things in the episode are worth links.

I also thrive on rules and regulations, and I think it's important we lay down a set now, nice and early. Such as adding timestamps to the subheading in <small> tags, which I think has the advantage of being seen in the table of contents. The other option is to vote on whether or not to use wikiboxes containing a picture, Skeptical Quote of the Week, timestamp lists and links to the forum topic, show notes and download, all in one place.

I repeat myself here on the forums in case we have anybody interested is this project who doesn't check the talk page.

Edit: What are your opinions on transcribing sentence fillers? I miss out the "um"s and the like, but I've been including the "you know"s and similar. We could transcribe the episodes accurately, or just what's necessary. That's not to say I think we should miss out things like (laughter), or should we?

Edit2: To add to this, I choose to rewrite what they said if they stumble their words so it makes a proper sentence. If I wrote word-for-word what was said, there would occasionally be a nonsensical sentence.
« Last Edit: Apr 18, 2012, 03:11:50 PM by Jay_One »
"I intend to live forever, or die trying." - Groucho Marx

Offline rwh

  • Off to a Start
  • *
  • Posts: 31
  • SGU Transcriber
    • roblog
Re: SGU Transcripts Project
« Reply #14 on: Apr 19, 2012, 04:00:31 AM »
What does everyone think about adding hyperlinks to the transcriptions? I've noticed in a few transcriptions there are links for certain people or items mentioned. It's been brought up on the community talk page that there are external wikipedia links and internal wiki links, though I'm not sure which things in the episode are worth links.

I've mostly been putting in links when I've needed to look up something for spelling.  But it's a good question... I think perhaps if there's a skeptical/scientific concept or person of note then it's probably worth a link.  I guess if you look at Wikipedia, we could use their linking policy as a good first approximation?

I also thrive on rules and regulations, and I think it's important we lay down a set now, nice and early. Such as adding timestamps to the subheading in <small> tags, which I think has the advantage of being seen in the table of contents. The other option is to vote on whether or not to use wikiboxes containing a picture, Skeptical Quote of the Week, timestamp lists and links to the forum topic, show notes and download, all in one place.

Haha, I thrive on almost the opposite, a more organic process.  But I'd like to work on a set of guidelines for those things. :)  So here are my votes:
- I personally don't really like the timestamps in the headings, I think it unduly emphasises them. I think if someone needs to find a section in the podcast it's easy enough for them to click on the section heading to take them to the section where they can find the timestamp.  This is the format I've been using, say here:

http://www.sgutranscripts.org/wiki/SGU_Episode_352#Aristolochia_Nephropathy

I don't feel too strongly about it though, so if everyone likes the timestamps in the headings I'm happy to change.

- I like the idea of wiki boxes, but I think that we shouldn't make them too complex.  They should be easy to add and give the most important summary info.  Specific notes on the sections:
 * Images: I like the idea. We'd need to get permission to copy or link to the images from the podcasts, though.  Should we just email info@theskepticsguide.org and ask for permission?
 * Timestamp lists: I think I prefer them in the contents and suspect it'd be a lot of work to duplicate them into the wiki box.  Am I contradicting myself now?  ;)  Now that I've thought about it more it does seem to make better sense to have them in the headings... :)
 * Links to the show notes, podcast and forum:  Great idea to move them to the wiki box.  In that vein, how about moving the skeptical rogues to the box as well?  And I've been thinking we should have a separate section, "Guests" and "Guest Rogues" for people who have those statuses to distinguish them from the regularly appearing Skeptical Rogues.

There's also the possibility of developing one of those footer things with, say previous/next episode to link the transcripts together in a sequence.  We also haven't used any Categories yet... I guess we're just getting started with learning the whole wiki thing...

Edit: What are your opinions on transcribing sentence fillers? I miss out the "um"s and the like, but I've been including the "you know"s and similar. We could transcribe the episodes accurately, or just what's necessary. That's not to say I think we should miss out things like (laughter), or should we?

I tend to only transcribe them if I feel like they add some meaning to the sentence, using my feeling at the time.  I think what we should go for is trying to make what we see as the original meaning as clear as possible.  I tend to leave out the "um"s, and "you knows" most of the time, but leave in "like" as in "they were like" (to mean "they said") or when it's used in its literal sense.  I do the same for interjections; there's often a quiet "yeah" from someone in the background when someone else is talking, I tend to leave those out (usually Evan).  But if they're loud or I want to make sure that it's clear that everyone is agreeing with something then I transcribe them.  I do the same for laughter, because it's important in understanding that something isn't to be read literally as there is often a lot of sarcasm in the humour that they use.  I also do my best to transcribe overlapping speech as it's often when people are most animated.  It can often be hard to understand people, so I put (inaudible) when I can't understand someone.

Edit2: To add to this, I choose to rewrite what they said if they stumble their words so it makes a proper sentence. If I wrote word-for-word what was said, there would occasionally be a nonsensical sentence.

I use a very light-on approach to rewording.  I think it's important to be careful about overly "interpreting" what they say, I think I'd prefer to be literal and let the reader draw their own conclusions about what they meant.  I find that you can often add commas where they've respoken or whatever, which helps a lot with understanding.  Of course, this really isn't super-critical as the podcast is the canonical source.  So really I'm pretty relaxed about this too.

I think what I really want to emphasise is that I'd like to keep the barrier to entry as low as possible.  The last thing we should do is burden people with scary rules or lots of required wiki markup.  If someone wants to just paste in unformatted text without any wiki markup at all I think we should encourage that.  Because the real work here is in getting even a first-pass transcription done.  There is just so much material to get through that we need all the help we can get.  It's much, much easier and faster to go in and do a proof-read of an existing transcript (it could be done at full-speed, even) than it is to do the original transcript.  Adding wiki-markup and timestamps, wiki boxes, all of that is likewise very easy to do after we have a basic transcript.

Now... what about US English vs International...? :P
SGUTranscripts - Transcripts of the Skeptics' Guide.