gnu-linux-libre
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNU-linux-libre] youtube-dl might be running non-free software from


From: Adonay Felipe Nogueira
Subject: Re: [GNU-linux-libre] youtube-dl might be running non-free software from
Date: Wed, 06 Sep 2017 11:13:08 -0300
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

If by "non-trivial" you mean "not so simple", then read on:

As others also did with youtube-dl, I decided to take a look on how
ViewTube does the things. I decided to do so with ViewTube because I'm
somewhat more used to JavaScript (the language in which ViewTube is
written) than Python (the language in which youtube-dl is written).

For ViewTube, if the video "reference" (not the URL) has "&s=", then the
text of the possibly-non-free code that is in "base.js" file is read
through the ytDecryptFunction() and made into the ytDecryptSignature()
function. However, the file itself is sent as text, not executed by
ViewTube.

The result of ytDecryptFunction() (that is: ytDecryptSignature()) is
very short, but it's still a JavaScript code, although it apparently is
only a series of variable declarations with math, but no conditionals,
no loops, and no breaks.

To understand how ytDecryptSignature() function comes to existance, I
decided to test with a video that is known to trigger the "signature
decryption". The video is:

<https://www.youtube.com/watch?v=ghQvZ9IID2A>

I downloaded its "base.js" that, at least for my case, is located at:

<https://www.youtube.com/yts/jsbin/player-vfleGnGfg/pt_BR/base.js>

In order to make ytDecryptSignature(), the ytDecryptFunction() does as
follows:

1. Declares the variables it will use. Nothing unusual so far.

2. Takes the "base.js" text and removes all line breaks. Id.

3. In the new "base.js", look for function name that is used to decrypt
   things. For this it makes use of the /"signature"\s*,\s*([^\)]*?)\(/
   regular expression. I confess that this one puzzled me for some time,
   specially the "([^\)]*?)\(" part, I'll explain more:

   - The '"signature"\s*,\s*' part will match: '"signature"' followed by
     zero or more space-like characters, followed by one comma, followed
     by zero or more space-like characters. So far this is OK.

   - The "([^\)]*?)\(" part, will capture a sequence of characters that
     doesn't have a closing parenthesis ("([^\)]*?)" part) as long as
     such sequence is immediatelly before an opening parenthesis ("\("
     part). The "*?" in the capture, and the "\(" after it will make
     sure that the capture always matches the shortest match, so it
     would match "RE" in "RE(c" and "exampleA" in "exampleA(argumentA) =
     { if (".

     I do find it risky that they are using a rule such as "every
     character except closing parenthesis", but thankfully, almost every
     JavaScript procedure/command/function requires a pair of
     parenthesis.

4. Now that it has the function name, it builds a regular expression to
   match the function itself:

   - If the function name has a "$", escape it.

   - Append "\\s*=\\s*function\\s*" to the regular expression. The
     "\\s*" will match any number of space-like characters. The rest is
     taken literally.

   - Append "\\s*\\(\\w+\\)\\s*\\{(.*?)\\}" to the regular
     expression. "\\(" and "\\)" will match parenthesis literally. Same
     for "\\{" and "\\}". The "\\w+" matches a sequence of at least one
     alphanumeric and underscore characters. "(.*?)" is a capture that
     will match the shortest sequence of any characters as long as it's
     between "{" and "}".

     I'd like to point out that "\\{(.*?)\\}" could match almost
     anything from conditionals, function calls, and also pairs of
     parenthesis. However, it will stop in the middle if it finds a "}".

5. With this regular expression finally ready, ViewTube simply uses it to
   find the function in the modified "base.js".

6. ViewTube then travels through the modified "base.js" file so that it
   finds each part of the main function and copies the custom variables
   and functions it depends on, so as to make the result
   "self-contained".

7. Them it makes sure that the main definion is prefixed with "var " to
   declare it as variable, and is between "try {" and "} catch(e) {
   return null }". After this, ViewTube makes a new function out of the
   result and makes sure that the first argument passed to it is inside
   the function-local "a" variable. This "a" variable, in ViewTube is a
   series of characters that stops before an ampersand (&) or line
   ending ($) (this is controlled by the "&s=(.*?)(&|$)" match part).

The result is similar to this (with functions inside):

try {var yE={CT:function(a,b){a.splice(0,b)},Jj:function(a,b){var 
c=a[0];a[0]=a[b%a.length];a[b]=c},c1:function(a){a.reverse()}};a=a.split("");yE.CT(a,3);yE.Jj(a,51);yE.Jj(a,36);yE.CT(a,3);return
 a.join("")} catch(e) {return null}

Which can be beautified as:

--8<---------------cut here---------------start------------->8---
try {
    var yE = {
        CT : function(a, b) {
           a.splice(0, b)
        },
        Jj : function(a, b) {
           var c = a[0];
           a[0] = a[b % a.length];
           a[b] = c
        },
        c1 : function(a) {
           a.reverse()
        }
    };
    a = a.split("");
    yE.CT(a, 3);
    yE.Jj(a, 51);
    yE.Jj(a, 36);
    yE.CT(a, 3);
    return a.join("")
} catch(e) {
  return null
}
--8<---------------cut here---------------end--------------->8---

However, none of these variations make use of "if", "case" and other
loops, except for the calls between themselves.

Let's take the following video signature as an example:

A84A842840706E172FA34F8E6096BBE8300549E2AA8AA4.31ADA18F942258766E20BFBEE4582464226B277C

This would be the "s" value in the video "reference" and the value of
the "a" variable in the ytDecryptSignature().

1. 'a = a.split("");' makes each character as an element/key of a new
   array. That is: a[0] = "A", a[1] = "8" and so on.

2. "a.splice(0, 3)" *removes items from 0 to 2. So removing the first 3
   items. Now, "a" is:

   
A842840706E172FA34F8E6096BBE8300549E2AA8AA4.31ADA18F942258766E20BFBEE4582464226B277C

3. "var c = a[0];" takes the first array element/key's value, "A" and
   saves it in "c" variable.

   "a[0] = a[51 % a.length];" takes the length of "a" (length: 84),
   takes the reminder of 51/84 (reminder: 51) and takes the value of the
   51 element/key from "a" to put it as the value of the 0 element/key
   of "a".

   "a[51] = c" (51 here *isn't* the reminder) sets the value of 51
   element/key of "a" to the previously known value of a[0].

   Now the value of "a" is:

   
F842840706E172FA34F8E6096BBE8300549E2AA8AA4.31ADA18A942258766E20BFBEE4582464226B277C

4. Do (3) again, but using the number 36 as array element key reference
   and as dividend for the remainder calculations.

   Now the value of "a" is:

   
2842840706E172FA34F8E6096BBE8300549EFAA8AA4.31ADA18A942258766E20BFBEE4582464226B277C

5. "a.splice(0, 3)" removes the first three elements of the "a" array
   again.

   Now the value of "a" is:

   
2840706E172FA34F8E6096BBE8300549EFAA8AA4.31ADA18A942258766E20BFBEE4582464226B277C

6. 'return a.join("")' simply joins the "a" array into a string again,
   and returns it as the result of the function.

Now, as to "how often" it is non-trivial, that depends on how YouTube
decides to make the video available. Others in this thread and also in
the one in the Trisquel forum for the English speakers have pointed out
that this signature is common in videos which contain content under the
standard copyright license, any music video, or any "VEVO"
video. Searching for "youtube copyright statistics" results in no
official document so far.

So, since there's no official documentation of the copyright statistics
of videos in YouTube, and we are dealing with non-free software, I
*think* this is mostly (75%) non-trivial. Of course, I might be
wrong. :)

Ineiev <address@hidden> writes:

> Hello,
>
>
> How often is that code non-trivial?
>

-- 
- https://libreplanet.org/wiki/User:Adfeno
- Palestrante e consultor sobre /software/ livre (não confundir com
  gratis).
- "WhatsApp"? Ele não é livre. Por favor, use o GNU Ring ou o Tox.
- Contato: https://libreplanet.org/wiki/User:Adfeno#vCard
- Arquivos comuns aceitos (apenas sem DRM): Corel Draw, Microsoft
  Office, MP3, MP4, WMA, WMV.
- Arquivos comuns aceitos e enviados: CSV, GNU Dia, GNU Emacs Org, GNU
  GIMP, Inkscape SVG, JPG, LibreOffice (padrão ODF), OGG, OPUS, PDF
  (apenas sem DRM), PNG, TXT, WEBM.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]