[NAS Series Second Edition] Configuration for Fully Automatic Downloading, Scraping, and Organizing of Video Files

In the previous article, I documented the process of hardware assembly and system installation, ultimately successfully installing Arch Linux:

Next, I will focus on describing the use and configuration of application software. Today's article mainly discusses how to configure fully automated downloading, scraping, and organizing of media files.

Note:

Readers should ensure they have a basic understanding of concepts like BT and PT before proceeding;

nastools requires an account on an authenticated site to function properly; further details will not be provided, please obtain an account on your own;

This article only describes the ideas and overall process, and does not list all related knowledge; please refer to related materials for more information;

This article does not address network issues; it is assumed that docker images can connect to the internet; if not, please manually configure HTTP_PROXY and HTTPS_PROXY for the images.

Overview#

Before introducing the process, I need to provide a brief introduction to each component of this process, giving readers an overall understanding. Below, these components are described in subsections.

Resource Sources#

It's easy to understand that the files we want to download always have a source. In this context, our resource sources mainly refer to various PT sites.

Indexer#

To find resources on the site, we need an indexer. As the name suggests, an indexer is a tool that searches for user-input content on the site and returns results.

Downloader#

After indexing, once we select the matching resources, we need to hand over the resource's torrent to various BT download tools to parse the torrent content and download it locally.

Media Server#

Once the files are downloaded locally, we need a centralized entry point to access and manage our video files, providing features like remote playback. This tool is the media server.

Scraper#

A scraper is a tool that extracts information related to a video, such as movie images, ratings, and descriptions, from a video file. The typical implementation mechanism for such tools is to use the video file's name to find matching movie information on open movie information sites (like TMDB), and then tag the video file with the matched movie information.
Generally, media servers come with scraping functionality that can recognize local files corresponding to movies, but the effectiveness is poor and often fails to match.

Mover#

The mover is closely related to the scraper. As mentioned above, media servers come with scraping functionality, but the effectiveness is poor. There are mainly two ways to address this:

Do not use the built-in scraping of the media server; instead, manually scrape using third-party tools (like TMM);
Rename the video files to a simple format of "Movie Name + Year," which has a nearly one hundred percent probability of being recognized by the media server's built-in scraper.

The second method mentioned here is what I refer to as "moving," but since the downloader not only needs to download video files but also needs to ensure the video files exist, otherwise it cannot seed. Therefore, moving can generally be divided into four types:

Move/Rename (directly modifies the original file, cannot continue seeding)
Copy (makes a copy, can continue seeding but wastes half of the hard disk space)
Soft Link (does not take up additional space, but soft links are often not recognized as files, making them invisible when browsing directories with a regular file manager)
Hard Link (does not take up additional space, can be recognized by file managers, but cannot cross hard disk partitions)

In practice, soft links and hard links are often chosen for file moving.

Summary#

Above, I introduced each component of this process. For ease of understanding, the breakdown is quite scattered, and many tools have multiple functions simultaneously.

In the following sections, I will use nastools to access resource sources and act as an indexer and mover, qbittorrent as the downloader, and emby as the scraper and media server for configuration.

Prerequisites#

The following applications run using docker/docker compose. In Arch Linux, you can install them with the following command:

sudo pacman -S docker docker-compose

Running Qbittorrent#

Run using the johngong/qbittorrent:latest image. Below is my compose.yml example:

services:
  container:
    network_mode: host
    environment:
      - UID=1000
      - GID=1000
      - UMASK=022
      - TZ=Asia/Shanghai
      - QB_WEBUI_PORT=54321  # web port
      - QB_EE_BIN=false
      - QB_TRACKERS_UPDATE_AUTO=true
      - QB_TRACKERS_LIST_URL=https://raw.githubusercontent.com/ngosang/trackerslist/master/trackers_best.txt
    volumes:
      - /home/amtoaer/Downloads:/Downloads  # download directory
      - /home/amtoaer/.config/nas/qbittorrent:/config  # configuration files
    image: johngong/qbittorrent:latest
    restart: always

Run with docker compose up -d. Open ip:${QB_WEBUI_PORT} to see the page; this will not be elaborated here:

WX20230811-230514@2x

Running Emby#

Emby itself is a paid software; the version used here is a cracked one. If needed, please purchase the official version.

Use the following compose file:

services:
  container:
    network_mode: bridge
    image: lovechen/embyserver:latest
    environment:
      - UID=1000
      - GID=1000
      - GIDLIST=0
    volumes:
      - /home/amtoaer/.config/nas/emby:/config  # configuration files
      - /home/amtoaer/Videos:/Videos  # media directory (not the download directory)
    ports:
      - 54319:8096  # left side is the host port
    devices:
      # Please refer to the official documentation; the mounted content may vary
      - /dev/dri:/dev/dri
    restart: always

Similarly, run with docker compose up -d, open ip:port to access the web page, complete the initialization process, and enter the homepage (the homepage should be empty after initialization, but the layout will be consistent with this):

Running nas-tools#

services:
  container:
    network_mode: bridge
    ports:
      - 54317:3000  # left side is the host port
    volumes:
      - /home/amtoaer/.config/nas/nas-tools:/config  # configuration files
      - /home/amtoaer/Downloads:/Downloads  # download directory
      - /home/amtoaer/Videos:/Videos  # media directory
    environment:
      - PUID=1000
      - PGID=1000
      - UMASK=022
      - NASTOOL_AUTO_UPDATE=false
      - NASTOOL_CN_UPDATE=false
    image: nastool/nas-tools:latest
    restart: always

Run the command the same way as above, open ip:port, and log in:

Configuration#

Next, we need to configure these programs to work together. The overall workflow is as follows:

Configure the PT site account in nastools and enable the corresponding indexer;
Search or subscribe to resources in nastools, throwing download tasks to Qbittorrent when downloading;
After the download is complete, trigger the mover in nastools to transfer the video files from the download directory to the media directory using the specified moving method;
Emby sets the media directory as the media library, and after the file transfer is complete, it recognizes that there are new files in the media directory, triggering the scraping process to display the video in Emby.

The details are described below.

Configure Account and Enable Indexer#

Enter the "Site Maintenance" in the nastools sidebar, click "Add Site" in the upper right corner, and add the site-related information:

Next, in the sidebar "Settings -> Indexer," enable all supported sites:

Set Up Downloader#

In the sidebar "Settings -> Downloader," configure the connection to your downloader:

Note:

nastools runs in docker, and 172.17.0.1 is the default gateway of the docker network, representing the host machine.

Setting monitoring to "Yes" means triggering the move after the download is complete; setting it to "No" means listening to the download directory and triggering the move when there are new files in the download directory. I personally set it to "No" and complete the move by listening to the download directory.

Set Up Listening Directory#

If the downloader's monitoring setting is set to "No," you need to enable listening to the download directory in the sidebar "Settings -> Directory Sync," as shown below:

Here, I chose "Copy" as the moving method. Readers may find this strange because copying will obviously take up double the storage space, causing serious waste. Curious readers can check my explanation at the end.