degooged-tube

An adless, accountless, trackerless youtube interface with a sub box

INSTALLATION
ABOUT
CLI-USAGE
API-USAGE
DEVELOPMENT
SCRAPING
SANATIZATION
YOUTUBE-API-WRAPPERS
DATA-PIPELINE

INSTALLATION

Requires mpv for terminal playback

Install degooged-tube via pypi using pip:

pip install degooged-tube

ABOUT

Allows for youtube to be used from a terminal with no ads, without an account account, and while maintaining subbox functionality.

Can be embedded into other projects using the API in degooged_tube/ytApiHacking/__init__.py

All youtube API scraping is done internally, with the exception of getting the streaming link for videos, which is done with yt-dlp

CLI-USAGE

Launch in terminal with command

degooged-tube

Follow prompts to create a new 'user' (info is stored locally, has nothing to do with a youtube account), I recommend adding a few subs initially as it makes getting use to the cli easier.

General Interface

CLI is interactive, options are displayed on the bottom of the screen and can be selected by entering the letter in the brackets, IE w for (w)atch.

Some options IE (w)atch will show you a numbered list then prompt you for which number you want to watch, others IE (p)revious/(n)ext page will just preform the action.

API-USAGE

degooged_tube/ytApiHacking/__init__.py contains getter functions to return dataclasses, aswell as YtapiList of dataclasses containing scraped data, the data in the dataclasses is self explainitory and the YtapiList can be indexed like normal python lists, throwing IndexError if value is out of bounds. There is also YtApiList.getPaginated(pageNum, pageSize) which returns a list of elements (pageSize is upper bound)

DEVELOPMENT

To build for development run:

git clone https://github.com/PrinceOfPuppers/degooged-tube

cd degooged-tube

pip install -e .

This will build and install sync-dl in place, allowing you to work on the code without having to reinstall after changes.

Automated Testing

python test.py [options]

Will run all unit and integration tests, options are -u and -i to only run the unit/integration tests respectively.

SCRAPING

youtube will sometimes change their API, most of the time this is not an issue as the scraping engine is robust, however sometimes scrapers will be broken, as such here is a guide on how they work and how to repair them

The configuration of youtube-api hacking exists in degooged_tube/ytApiHacking/controlPanel.py

ScrapeJsonTree

The core of degooged-tube is a json scraping engine which is robust against changing apis, and easily repairable in the event of a scraper breaking.

It works by defining nodes in a json tree down to the data you wish to collect, you can specify as many or as few nodes as is needed.

Examples

"greetings":{
    "hello": "there",
    "howdy": {"name": "partner"},
    "hi":[
        { 
            "name": "alice",
            "favColor": "blue",
            "coolness": "megaRad"
        },
        { 
            "name": "bob", 
            "favColor": "green",
            "coolness": 11
        },
        { 
            "name": "carol",
            "favColor": "purple",
            "coolness": ["super", "duper", "cool"]
        },
        { 
            "name": "dave",
            "favColor": "purple"
        }
    ]
}

Example Scrapers:

ScrapeNth("howdy") (N defaults to 1 or first)
-> we would just get {"howdy": {"name": "partner"}}
ScrapeNth("howdy", collapse = True)
-> we would just get {"name": "partner"}, collapse just returns the data for a node, rather than key value pairs
ScrapeAll("name")
-> we would just get {"name": ["partner", "alice", "bob", "carol", "dave"]}
ScrapeNth("hi", [ScrapeNth("name" )])
-> we would get {"hi": {"name": "alice"}}" because we would first get "hi", and then scrape for the first "name" in said data
ScrapeNth("hi", [ScrapeNth("name", collapse=True)])
-> we would get {"hi": "alice"}", "name" has been collapsed
ScrapeNth("hi", [ScrapeNth("name"), ScrapeNth("favColor")])
-> we would get {"hi": [{"name": "alice"}, {"favColor":"blue"}]}
ScrapeAll("name", collapse=True, dataCondition=lambda data: data[1] = 'a')
-> we would get ['carol', 'dave'] as dataCondition filters all data which fail the condition
ScrapeUnion([ScrapeNth("missing key",[], collapse=True), ScrapeNth("hello",[], collapse = True)])
-> we would get "there", however if "missing key" was present we would get the data for that instead

For more examples using all nodes and arguments, see degooged_tube/ytApiHacking/controlPanel.py.

For more involved examples of how this system is used in practice, see degooged_tube/tests/unitTests.py.

Node Types

ScrapeNth:
Scrapes and returns the data for n'th occurrence of the key
ScrapeAll:
Scrapes and returns the a list of data for all occurrence of the key
ScrapeLongest:
Scrapes and returns the data with largest len(data) for all occurrence of the key
ScrapeUnion([Node1, Node2, ...]):
Will scrape for each node and return the data first one which matches
ScrapeAllUnion/ScrapeAllUnionNode:
Used together as such: ScrapeAllUnion([ScrapeAllUnionNode, ScrapeAllUnionNode, ...]), will return a list containing all matches for any of the ScrapeAllUnionNode
ScrapeElement:
The generic type of any scrape node

SANATIZATION

After scraping the data is passed into the .fromData() constructor of a dataclass (ie VideoInfo.fromData()), all of which are located in degooged_tube/ytApiHacking/controlPanel.py where it is sanitized and any missing data can be filled with placeholder data

YOUTUBE-API-WRAPPERS

There are 3 Objects which which wrap the youtube-api and allow for easy access to resources

YtInitalPage

Constructed using YtInitalPage.fromUrl(url: str) gets a page and parses it for continuation chains and initial data

The method page.scrapeInitalData(dataFmt: Union[ScrapeElement, list[ScrapeElement]]) will run scrapeJsonTree on any initial data found and return it as a dict

YtApiList

Constructed from YtInitalPage, it behaves like a list and making calls to a continuation chain as items are indexed.

The continuation chain is decided using the apiUrl, and by duck-typing based on the scrapeFmt (following only chains which match the dataFmt).

Contains YtContIter, an iterator which wraps the continuation chains, deals the requests, as well as implements the aforementioned duck-typing

DATA-PIPELINE

pipeline under the hood (note this is not a callstack, for example processInfo is called in getter, its more a sequence diagram):

                                                                youtube-api
                                                                    ||
url/query -------------> getter --> YtInitalPage --+     +--> scrapeInitalData -------> scrapeJsonTree --> processInfo ---+---> fromData --> dataclass
   |                                               +-----+                                                                |
   +--> YtInitalPage --> getter -------------------+     +--> YtApiList/YtContIter ---> scrapeJsonTree --> onExtend ------+
                                                                    ||
                                                                youtube-api

Legend:

url/query:
url or search query
YtInitalPage:
created using fromUrl()
getter:
functions in degooged_tube/ytApiHacking/__init__.py prepended by get
scrapeInitalData:
a method of YtInitalPage is passes the requisite format to by getter
YtApiList\YtContIter:
YtApiList is instantiated by getter and passes the requisite format to by getter, YtContIter in instantiated as a component of YtApiList
scrapeJsonTree:
uses the format on data requested from the youtube-api
processInfo:
the functions in degooged_tube/ytApiHacking/__init__.py called process[BLANK]Info, typically just calls fromData on the requisite dataclass
onExtend:
the functions in degooged_tube/ytApiHacking/__init__.py called [BLANK]OnExtend, typically just calls fromData on the requisite dataclass (can be overridden to inject data using OnExtendKwargs in a few cases)
fromData:
the constructors in degooged_tube/ytApiHacking/controlPanel.py which create a dataclass to hold the scraped data (ie VideoInfo.fromData()). they also sanitize and any replaces missing fields with placeholder data if possible

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
bin		bin
degooged_tube		degooged_tube
utilities		utilities
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

degooged-tube

INSTALLATION

ABOUT

CLI-USAGE

General Interface

API-USAGE

DEVELOPMENT

Automated Testing

SCRAPING

ScrapeJsonTree

Examples

Node Types

SANATIZATION

YOUTUBE-API-WRAPPERS

YtInitalPage

YtApiList

DATA-PIPELINE

About

Releases

Packages

Languages

License

PrinceOfPuppers/degooged-tube

Folders and files

Latest commit

History

Repository files navigation

degooged-tube

INSTALLATION

ABOUT

CLI-USAGE

General Interface

API-USAGE

DEVELOPMENT

Automated Testing

SCRAPING

ScrapeJsonTree

Examples

Node Types

SANATIZATION

YOUTUBE-API-WRAPPERS

YtInitalPage

YtApiList

DATA-PIPELINE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages