2025-07-10 #tools #development #book-digitization

spreads-lite: A lightweight book digitization tool

Note: This is an independent fork of the original spreads project. spreads-lite is developed separately and is not affiliated with or endorsed by the original spreads project or DIYBookScanner.org.

A lightweight book digitization tool for creating digital copies of books and documents.

About

spreads-lite is a streamlined fork of the spreads project, redesigned to deliver professional book scanning results with minimal complexity. While the original spreads offers comprehensive features through an extensive plugin architecture, spreads-lite prioritizes accessibility and efficiency.

The project provides 80% of professional book digitization functionality while maintaining easy installation, minimal system resources, and straightforward operation. This approach makes it ideal for individuals, small libraries, and educational institutions who need reliable book digitization without enterprise-grade complexity.

Features

Full Python 3 support (3.7+) with current dependencies
Multi-camera support for V4L2 (USB/webcam), gphoto2 (DSLR), and CHDK (Canon PowerShot)
GUI supports simultaneous capture from two cameras
Integrated ScanTailor processing, Tesseract OCR, and image enhancement
Cross-platform operation on Linux, macOS, and Windows
Efficient performance on low-powered devices including Raspberry Pi
Direct workflow integration without complex plugin configuration
Both desktop GUI and command-line interfaces available
Straightforward installation with minimal dependencies

Installation

Requirements

Python 3.7+
System packages: v4l2-ctl, gphoto2, libgphoto2-dev, chdkptp (depending on camera type)

Install

pip install -r requirements.txt

Usage

GUI Interface

python lite_gui.py

The GUI supports dual camera simultaneous capture:

Assign different cameras to left and right positions
Simultaneous preview from both cameras
Capture images from both cameras at once
Images saved with left_ and right_ prefixes

Command Line Interface

Command	Description
`python lite_cli.py --list`	List available cameras
`python lite_cli.py -t -d "camera_name"`	Test specific camera
`python lite_cli.py -d project_path --take --device "camera_name"`	Start capture session
`python lite_cli.py -d project_path --take --count 50 --device "camera_name"`	Capture 50 images

Note: CLI currently supports single camera only. For dual camera functionality, use the GUI interface.

Complete Example

# 1. List available cameras
python lite_cli.py --list

# 2. Test camera connection
python lite_cli.py --test --device "Canon PowerShot"

# 3. Create project and capture
python lite_cli.py -d ~/my_book --take --device "Canon PowerShot"

# 4. Process captured images
python lite_cli.py -d ~/my_book --process

Project Structure

When you create a scanning project, spreads-lite automatically organizes files in a clear directory structure:

~/my_book/                          # Your project directory
├── capture_20250110_143022/         # CLI capture session (timestamp)
│   ├── page_001.jpg                # CLI single camera images
│   ├── page_002.jpg
│   └── ...
├── left_20250110_143022.jpg        # GUI left camera image
├── right_20250110_143022.jpg       # GUI right camera image
├── out/                            # ScanTailor processed images
│   ├── 000.tif
│   ├── 001.tif
│   └── ...
├── ocr/                            # OCR text extraction results
│   ├── 000.txt
│   ├── 001.txt
│   └── combined.txt                # All text combined
├── output/                         # Final output files
│   ├── my_book.pdf
│   ├── my_book_a4.pdf
│   └── my_book.epub                # If generated
└── project.json                    # Project metadata and settings

Directory Purposes

Directory	Purpose	When Created
`capture_YYYYMMDD_HHMMSS/`	Raw scanned images from camera(s)	During capture session
`out/`	ScanTailor processed images	After ScanTailor processing
`ocr/`	Text extraction results from Tesseract	After OCR processing
`output/`	Final PDF/EPUB files ready for use	After format conversion
`project.json`	Project settings and capture metadata	When project is created

Finding Your Results

Raw scans:
- CLI mode: Look in the timestamped capture_* folder with page_XXX.jpg files
- GUI dual mode: Look in project root for left_YYYYMMDD_HHMMSS.jpg and right_YYYYMMDD_HHMMSS.jpg files
Processed images: Check the out/ folder (ScanTailor output)
Extracted text: Find individual page text in ocr/ folder
Final books: Your completed PDF/EPUB files are in output/

System Requirements

50-100MB RAM
20MB + project storage
Any modern processor
Linux, macOS, Windows

Known Issues

ScanTailor Qt Warnings

When launching ScanTailor, you may encounter Qt-related warnings such as:

QObject::disconnect: wildcard call disconnects from destroyed signal of output::OptionsWidget
qt.qpa.plugin: Could not find the Qt platform plugin "wayland"

These warnings are harmless and do not affect ScanTailor’s functionality.

Camera-Specific Issues

CHDK cameras require manual driver installation
Some gphoto2 cameras may need specific configuration
V4L2 device detection varies by system

In Development

Lightweight API: Optional REST interface for remote control
Batch processing: Automated workflow capabilities
Advanced processing: Additional image enhancement options

Feedback & Issues

Please report bugs, suggest features, or share your experience by opening an issue on the project repository.

License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.

Copyright

Acknowledgments

Visual Assets: The monk image are derived from the original spreads project. If you are the copyright holder of these assets and have concerns, please contact author at cui.han@mantle-sound.org.

Project page: https://codeberg.org/bmarsh/spreads-lite