banner
amtoaer

晓风残月

叹息似的渺茫,你仍要保存着那真!
github
telegram
email
x
bilibili
steam
nintendo switch

[NAS Series Second Edition] Configuration for Fully Automatic Downloading, Scraping, and Organizing of Video Files

In the previous article, I documented the process of hardware assembly and system installation, ultimately successfully installing Arch Linux:

image

Next, I will focus on describing the use and configuration of application software. Today's article mainly discusses how to configure fully automated downloading, scraping, and organizing of media files.

Note:

  1. Readers should ensure they have a basic understanding of concepts like BT and PT before proceeding;
  2. nastools requires an account on an authenticated site to function properly; further details will not be provided, please obtain an account on your own;
  3. This article only describes the ideas and overall process, and does not list all related knowledge; please refer to related materials for more information;
  4. This article does not address network issues; it is assumed that docker images can connect to the internet; if not, please manually configure HTTP_PROXY and HTTPS_PROXY for the images.

Overview#

Before introducing the process, I need to provide a brief introduction to each component of this process, giving readers an overall understanding. Below, these components are described in subsections.

Resource Sources#

It's easy to understand that the files we want to download always have a source. In this context, our resource sources mainly refer to various PT sites.

Indexer#

To find resources on the site, we need an indexer. As the name suggests, an indexer is a tool that searches for user-input content on the site and returns results.

Downloader#

After indexing, once we select the matching resources, we need to hand over the resource's torrent to various BT download tools to parse the torrent content and download it locally.

Media Server#

Once the files are downloaded locally, we need a centralized entry point to access and manage our video files, providing features like remote playback. This tool is the media server.

Scraper#

A scraper is a tool that extracts information related to a video, such as movie images, ratings, and descriptions, from a video file. The typical implementation mechanism for such tools is to use the video file's name to find matching movie information on open movie information sites (like TMDB), and then tag the video file with the matched movie information.
Generally, media servers come with scraping functionality that can recognize local files corresponding to movies, but the effectiveness is poor and often fails to match.

Mover#

The mover is closely related to the scraper. As mentioned above, media servers come with scraping functionality, but the effectiveness is poor. There are mainly two ways to address this:

  • Do not use the built-in scraping of the media server; instead, manually scrape using third-party tools (like TMM);
  • Rename the video files to a simple format of "Movie Name + Year," which has a nearly one hundred percent probability of being recognized by the media server's built-in scraper.

The second method mentioned here is what I refer to as "moving," but since the downloader not only needs to download video files but also needs to ensure the video files exist, otherwise it cannot seed. Therefore, moving can generally be divided into four types:

  1. Move/Rename (directly modifies the original file, cannot continue seeding)
  2. Copy (makes a copy, can continue seeding but wastes half of the hard disk space)
  3. Soft Link (does not take up additional space, but soft links are often not recognized as files, making them invisible when browsing directories with a regular file manager)
  4. Hard Link (does not take up additional space, can be recognized by file managers, but cannot cross hard disk partitions)

In practice, soft links and hard links are often chosen for file moving.

Summary#

Above, I introduced each component of this process. For ease of understanding, the breakdown is quite scattered, and many tools have multiple functions simultaneously.

In the following sections, I will use nastools to access resource sources and act as an indexer and mover, qbittorrent as the downloader, and emby as the scraper and media server for configuration.

Prerequisites#

The following applications run using docker/docker compose. In Arch Linux, you can install them with the following command:

sudo pacman -S docker docker-compose

Running Qbittorrent#

Run using the johngong/qbittorrent:latest image. Below is my compose.yml example:

services:
  container:
    network_mode: host
    environment:
      - UID=1000
      - GID=1000
      - UMASK=022
      - TZ=Asia/Shanghai
      - QB_WEBUI_PORT=54321  # web port
      - QB_EE_BIN=false
      - QB_TRACKERS_UPDATE_AUTO=true
      - QB_TRACKERS_LIST_URL=https://raw.githubusercontent.com/ngosang/trackerslist/master/trackers_best.txt
    volumes:
      - /home/amtoaer/Downloads:/Downloads  # download directory
      - /home/amtoaer/.config/nas/qbittorrent:/config  # configuration files
    image: johngong/qbittorrent:latest
    restart: always

Run with docker compose up -d. Open ip:${QB_WEBUI_PORT} to see the page; this will not be elaborated here:

WX20230811-230514@2x

Running Emby#

Emby itself is a paid software; the version used here is a cracked one. If needed, please purchase the official version.

Use the following compose file:

services:
  container:
    network_mode: bridge
    image: lovechen/embyserver:latest
    environment:
      - UID=1000
      - GID=1000
      - GIDLIST=0
    volumes:
      - /home/amtoaer/.config/nas/emby:/config  # configuration files
      - /home/amtoaer/Videos:/Videos  # media directory (not the download directory)
    ports:
      - 54319:8096  # left side is the host port
    devices:
      # Please refer to the official documentation; the mounted content may vary
      - /dev/dri:/dev/dri
    restart: always

Similarly, run with docker compose up -d, open ip:port to access the web page, complete the initialization process, and enter the homepage (the homepage should be empty after initialization, but the layout will be consistent with this):

image

Running nas-tools#

services:
  container:
    network_mode: bridge
    ports:
      - 54317:3000  # left side is the host port
    volumes:
      - /home/amtoaer/.config/nas/nas-tools:/config  # configuration files
      - /home/amtoaer/Downloads:/Downloads  # download directory
      - /home/amtoaer/Videos:/Videos  # media directory
    environment:
      - PUID=1000
      - PGID=1000
      - UMASK=022
      - NASTOOL_AUTO_UPDATE=false
      - NASTOOL_CN_UPDATE=false
    image: nastool/nas-tools:latest
    restart: always

Run the command the same way as above, open ip:port, and log in:

image

Configuration#

Next, we need to configure these programs to work together. The overall workflow is as follows:

  1. Configure the PT site account in nastools and enable the corresponding indexer;
  2. Search or subscribe to resources in nastools, throwing download tasks to Qbittorrent when downloading;
  3. After the download is complete, trigger the mover in nastools to transfer the video files from the download directory to the media directory using the specified moving method;
  4. Emby sets the media directory as the media library, and after the file transfer is complete, it recognizes that there are new files in the media directory, triggering the scraping process to display the video in Emby.

The details are described below.

Configure Account and Enable Indexer#

Enter the "Site Maintenance" in the nastools sidebar, click "Add Site" in the upper right corner, and add the site-related information:
image

Next, in the sidebar "Settings -> Indexer," enable all supported sites:

image

Set Up Downloader#

In the sidebar "Settings -> Downloader," configure the connection to your downloader:

Note:

  1. nastools runs in docker, and 172.17.0.1 is the default gateway of the docker network, representing the host machine.

  2. Setting monitoring to "Yes" means triggering the move after the download is complete; setting it to "No" means listening to the download directory and triggering the move when there are new files in the download directory. I personally set it to "No" and complete the move by listening to the download directory.

image

Set Up Listening Directory#

If the downloader's monitoring setting is set to "No," you need to enable listening to the download directory in the sidebar "Settings -> Directory Sync," as shown below:

Here, I chose "Copy" as the moving method. Readers may find this strange because copying will obviously take up double the storage space, causing serious waste. Curious readers can check my explanation at the end.

image

Set Up Media Server#

Click on "Settings -> Media Server" in the sidebar, select Emby, and fill in the content:

The API Key is generated through Emby's "Settings -> API Key."

image

Set Up Emby's Media Library Directory#

In Emby's "Settings -> Media Library," add a media library. An example configuration is as follows:

image

image

Summary#

After the above configurations, this process should be able to work normally. Below is a demonstration:

  1. Search for the movie title in the search box and click the search button at the bottom left of the movie.

image

  1. Click on the torrent you want to download, select a category (it can also be automatic), and download.

image

  1. See that nastools automatically added a task for Qbittorrent:

image

image

  1. After the download is complete, it automatically transfers and is scraped by Emby to display in the media library.

image

image

image

Thus, the configuration is complete.

Further Reading#

Why choose to use a cracked version of the closed-source paid Emby instead of the open-source fork Jellyfin?#

In short, the difference in user experience is enormous.

Initially, I also had the mindset of supporting open source and first tried Jellyfin, but during use, I found at least the following issues:

  1. The overall UI is ugly and almost unchanged.
    Although Jellyfin-Vue was recently launched, it has not yet been applied to the client, and the Android client does not use the system font but comes with its own, creating a disjointed visual experience.
  2. Unbearably high disk IO.
    During scraping, Jellyfin's disk read and write are extremely frequent; even with Emby enabling features like "Identify Video Headers," which rely on reading video files, it is much quieter and faster than Jellyfin's ordinary scraping.
  3. Infuriatingly slow initial scraping speed.
    I once had nearly 8TB of resources, and Jellyfin scraped for several weeks without any additional settings. Checking Jellyfin's related issues, I found that this problem was raised a long time ago, but it has remained open for over three years.

Therefore, I abandoned Jellyfin and turned to Emby.

However, I strongly do not recommend friends use pirated software. I initially tried it out of curiosity and did not commit to paying, as $119 is not a small expense, but after using it for so long, it's about time to pay up.
Recently, the RMB has dropped significantly; I plan to pay once it rises a bit.

Why choose the copy moving method? Won't it take up extra storage space?#

As I mentioned earlier, to avoid wasting extra storage space, the moving method often chooses between soft links and hard links. But why did I choose to copy? It's because the btrfs file system I use on my hard drive supports the cow mechanism.

In simple terms, cow in the file system means that when copying a file, the new file only includes a set of metadata, and the actual data blocks are shared with the original file. Only when modifications are made to the copied file will the data blocks of that file actually be copied. Considering our file moving scenario, we merely want to move the video file to a new location without any need to modify the video file, so this mechanism can be perfectly utilized to achieve space reuse.

This approach avoids the drawback of soft links not being recognized in file managers while bypassing the limitation of hard links only being usable on the same physical hard drive (theoretically feasible, as btrfs supports creating disk partitions across multiple physical hard drives and also supports cow copying of subvolumes within partitions).

However, in actual use, I found that the copy feature in nastools does not support this feature. Checking the source code, I found that the copy used in nastools is shutil.copy2(os.path.normpath(src), os.path.normpath(dest)), which does not support this space reuse feature (also known as reflink). Therefore, I replaced this part of the code with a third-party library reflink to achieve the goal.

In practice, it can be seen that the video files copied via reflink do not occupy any additional space:

image

The features I introduced above are just a small part of these tools. For example, nastools also supports video subscriptions, custom recognition words, custom ignore words, etc.; Emby also supports header tagging, user management, video transcoding, etc. Users need to explore these features in conjunction with the materials:

  1. nastools official wiki
  2. Emby documentation
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.