v_qq_dl a small video downloader
28 Aug 2017There is a Chinese video hosting website v.qq.com from Tencent company has many interesting tv shows. The servers are in China so it would be slow to watch in web browser. And I usually watch my TV with Nexus Player, which dose not have a program to watch online video directly from v.qq.com.
Before I started to program my own downloader, I searched github for existing downloader. There is one called youtube-dl, which while this article was written, dose not support v.qq.com.
The other one you-get supports v.qq.com. However, it only download at 2KB/s while downloading directly.
So I started to program my own downloader and use proxy and multiple threaded downloading to increase the speed. The downloader can also resume from broken downloads.
1. HTTP partial requests
HTTP range requests allow to send only a portion of an HTTP message from a server to a client. This is very useful for resuming downloads.
We can check whether the server sports partial by checking “Accept-Ranges” in the header of the HTTP response. This is not implemented in my program right now.
2. Using Python to download the file
There is a very easy to use python HTTP package called requests can be used to download files from internet. A sample download code is here:
This part is the core code for downloading. However, we need to wrap the code with multithreading and also need to monitor the progress to support resuming.
3. Python multithreading
We can use either multithreading or multiprocessing to increase the download speed. To monitor the download progress, data has to be shared between instance. Which would be easier to be implemented in multithreading program. Recall that different thread are using the same data.
In this program, I am using pathos package for the multithreading for 2 reasons.
- It can use Pool for threads to limit the number of threads
- It use dill, which can serialize file pointer to be passed in arguments
4. Tracking progress
To be able to resume broken downloads, the program has to have the ability to track and record downloading progress for each file. In this program, we divided the media file into blocks and use an info file to track each block. To resuming the download, exam the info file and only download the unfinished blocks by specify range header for the HTTP requests. The core download code and the monitor code is:
The source code of the project is here: