spreads-lite: A lightweight book digitization tool

Note: This is an independent fork of the original spreads project. spreads-lite is developed separately and is not affiliated with or endorsed by the original spreads project or DIYBookScanner.org.

A lightweight book digitization tool for creating digital copies of books and documents.

About

spreads-lite is a streamlined fork of the spreads project, redesigned to deliver professional book scanning results with minimal complexity. While the original spreads offers comprehensive features through an extensive plugin architecture, spreads-lite prioritizes accessibility and efficiency.

The project provides 80% of professional book digitization functionality while maintaining easy installation, minimal system resources, and straightforward operation. This approach makes it ideal for individuals, small libraries, and educational institutions who need reliable book digitization without enterprise-grade complexity.

Features

  • Full Python 3 support (3.7+) with current dependencies
  • Multi-camera support for V4L2 (USB/webcam), gphoto2 (DSLR), and CHDK (Canon PowerShot)
  • GUI supports simultaneous capture from two cameras
  • Integrated ScanTailor processing, Tesseract OCR, and image enhancement
  • Cross-platform operation on Linux, macOS, and Windows
  • Efficient performance on low-powered devices including Raspberry Pi
  • Direct workflow integration without complex plugin configuration
  • Both desktop GUI and command-line interfaces available
  • Straightforward installation with minimal dependencies

Installation

Requirements

  • Python 3.7+
  • System packages: v4l2-ctl, gphoto2, libgphoto2-dev, chdkptp (depending on camera type)

Install

pip install -r requirements.txt

Usage

GUI Interface

python lite_gui.py

The GUI supports dual camera simultaneous capture:

  • Assign different cameras to left and right positions
  • Simultaneous preview from both cameras
  • Capture images from both cameras at once
  • Images saved with left_ and right_ prefixes

Command Line Interface

CommandDescription
python lite_cli.py --listList available cameras
python lite_cli.py -t -d "camera_name"Test specific camera
python lite_cli.py -d project_path --take --device "camera_name"Start capture session
python lite_cli.py -d project_path --take --count 50 --device "camera_name"Capture 50 images

Note: CLI currently supports single camera only. For dual camera functionality, use the GUI interface.

Complete Example

# 1. List available cameras
python lite_cli.py --list

# 2. Test camera connection
python lite_cli.py --test --device "Canon PowerShot"

# 3. Create project and capture
python lite_cli.py -d ~/my_book --take --device "Canon PowerShot"

# 4. Process captured images
python lite_cli.py -d ~/my_book --process

Project Structure

When you create a scanning project, spreads-lite automatically organizes files in a clear directory structure:

~/my_book/                          # Your project directory
├── capture_20250110_143022/         # CLI capture session (timestamp)
│   ├── page_001.jpg                # CLI single camera images
│   ├── page_002.jpg
│   └── ...
├── left_20250110_143022.jpg        # GUI left camera image
├── right_20250110_143022.jpg       # GUI right camera image
├── out/                            # ScanTailor processed images
│   ├── 000.tif
│   ├── 001.tif
│   └── ...
├── ocr/                            # OCR text extraction results
│   ├── 000.txt
│   ├── 001.txt
│   └── combined.txt                # All text combined
├── output/                         # Final output files
│   ├── my_book.pdf
│   ├── my_book_a4.pdf
│   └── my_book.epub                # If generated
└── project.json                    # Project metadata and settings

Directory Purposes

DirectoryPurposeWhen Created
capture_YYYYMMDD_HHMMSS/Raw scanned images from camera(s)During capture session
out/ScanTailor processed imagesAfter ScanTailor processing
ocr/Text extraction results from TesseractAfter OCR processing
output/Final PDF/EPUB files ready for useAfter format conversion
project.jsonProject settings and capture metadataWhen project is created

Finding Your Results

  • Raw scans:
    • CLI mode: Look in the timestamped capture_* folder with page_XXX.jpg files
    • GUI dual mode: Look in project root for left_YYYYMMDD_HHMMSS.jpg and right_YYYYMMDD_HHMMSS.jpg files
  • Processed images: Check the out/ folder (ScanTailor output)
  • Extracted text: Find individual page text in ocr/ folder
  • Final books: Your completed PDF/EPUB files are in output/

System Requirements

  • 50-100MB RAM
  • 20MB + project storage
  • Any modern processor
  • Linux, macOS, Windows

Known Issues

ScanTailor Qt Warnings

When launching ScanTailor, you may encounter Qt-related warnings such as:

QObject::disconnect: wildcard call disconnects from destroyed signal of output::OptionsWidget
qt.qpa.plugin: Could not find the Qt platform plugin "wayland"

These warnings are harmless and do not affect ScanTailor’s functionality.

Camera-Specific Issues

  • CHDK cameras require manual driver installation
  • Some gphoto2 cameras may need specific configuration
  • V4L2 device detection varies by system

In Development

  • Lightweight API: Optional REST interface for remote control
  • Batch processing: Automated workflow capabilities
  • Advanced processing: Additional image enhancement options

Feedback & Issues

Please report bugs, suggest features, or share your experience by opening an issue on the project repository.

License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.

Copyright (C) 2014 Johannes Baiter johannes.baiter@gmail.com
Copyright (C) 2025 Bib Marsh cui.han@mantle-sound.org

Acknowledgments

Visual Assets: The monk image are derived from the original spreads project. If you are the copyright holder of these assets and have concerns, please contact author at cui.han@mantle-sound.org.


Project page: https://codeberg.org/bmarsh/spreads-lite