spreads-lite: A lightweight book digitization tool
Note: This is an independent fork of the original spreads project. spreads-lite is developed separately and is not affiliated with or endorsed by the original spreads project or DIYBookScanner.org.
A lightweight book digitization tool for creating digital copies of books and documents.
About
spreads-lite is a streamlined fork of the spreads project, redesigned to deliver professional book scanning results with minimal complexity. While the original spreads offers comprehensive features through an extensive plugin architecture, spreads-lite prioritizes accessibility and efficiency.
The project provides 80% of professional book digitization functionality while maintaining easy installation, minimal system resources, and straightforward operation. This approach makes it ideal for individuals, small libraries, and educational institutions who need reliable book digitization without enterprise-grade complexity.
Features
- Full Python 3 support (3.7+) with current dependencies
- Multi-camera support for V4L2 (USB/webcam), gphoto2 (DSLR), and CHDK (Canon PowerShot)
- GUI supports simultaneous capture from two cameras
- Integrated ScanTailor processing, Tesseract OCR, and image enhancement
- Cross-platform operation on Linux, macOS, and Windows
- Efficient performance on low-powered devices including Raspberry Pi
- Direct workflow integration without complex plugin configuration
- Both desktop GUI and command-line interfaces available
- Straightforward installation with minimal dependencies
Installation
Requirements
- Python 3.7+
- System packages:
v4l2-ctl
,gphoto2
,libgphoto2-dev
,chdkptp
(depending on camera type)
Install
pip install -r requirements.txt
Usage
GUI Interface
python lite_gui.py
The GUI supports dual camera simultaneous capture:
- Assign different cameras to left and right positions
- Simultaneous preview from both cameras
- Capture images from both cameras at once
- Images saved with
left_
andright_
prefixes
Command Line Interface
Command | Description |
---|---|
python lite_cli.py --list | List available cameras |
python lite_cli.py -t -d "camera_name" | Test specific camera |
python lite_cli.py -d project_path --take --device "camera_name" | Start capture session |
python lite_cli.py -d project_path --take --count 50 --device "camera_name" | Capture 50 images |
Note: CLI currently supports single camera only. For dual camera functionality, use the GUI interface.
Complete Example
# 1. List available cameras
python lite_cli.py --list
# 2. Test camera connection
python lite_cli.py --test --device "Canon PowerShot"
# 3. Create project and capture
python lite_cli.py -d ~/my_book --take --device "Canon PowerShot"
# 4. Process captured images
python lite_cli.py -d ~/my_book --process
Project Structure
When you create a scanning project, spreads-lite automatically organizes files in a clear directory structure:
~/my_book/ # Your project directory
├── capture_20250110_143022/ # CLI capture session (timestamp)
│ ├── page_001.jpg # CLI single camera images
│ ├── page_002.jpg
│ └── ...
├── left_20250110_143022.jpg # GUI left camera image
├── right_20250110_143022.jpg # GUI right camera image
├── out/ # ScanTailor processed images
│ ├── 000.tif
│ ├── 001.tif
│ └── ...
├── ocr/ # OCR text extraction results
│ ├── 000.txt
│ ├── 001.txt
│ └── combined.txt # All text combined
├── output/ # Final output files
│ ├── my_book.pdf
│ ├── my_book_a4.pdf
│ └── my_book.epub # If generated
└── project.json # Project metadata and settings
Directory Purposes
Directory | Purpose | When Created |
---|---|---|
capture_YYYYMMDD_HHMMSS/ | Raw scanned images from camera(s) | During capture session |
out/ | ScanTailor processed images | After ScanTailor processing |
ocr/ | Text extraction results from Tesseract | After OCR processing |
output/ | Final PDF/EPUB files ready for use | After format conversion |
project.json | Project settings and capture metadata | When project is created |
Finding Your Results
- Raw scans:
- CLI mode: Look in the timestamped
capture_*
folder withpage_XXX.jpg
files - GUI dual mode: Look in project root for
left_YYYYMMDD_HHMMSS.jpg
andright_YYYYMMDD_HHMMSS.jpg
files
- CLI mode: Look in the timestamped
- Processed images: Check the
out/
folder (ScanTailor output) - Extracted text: Find individual page text in
ocr/
folder - Final books: Your completed PDF/EPUB files are in
output/
System Requirements
- 50-100MB RAM
- 20MB + project storage
- Any modern processor
- Linux, macOS, Windows
Known Issues
ScanTailor Qt Warnings
When launching ScanTailor, you may encounter Qt-related warnings such as:
QObject::disconnect: wildcard call disconnects from destroyed signal of output::OptionsWidget
qt.qpa.plugin: Could not find the Qt platform plugin "wayland"
These warnings are harmless and do not affect ScanTailor’s functionality.
Camera-Specific Issues
- CHDK cameras require manual driver installation
- Some gphoto2 cameras may need specific configuration
- V4L2 device detection varies by system
In Development
- Lightweight API: Optional REST interface for remote control
- Batch processing: Automated workflow capabilities
- Advanced processing: Additional image enhancement options
Feedback & Issues
Please report bugs, suggest features, or share your experience by opening an issue on the project repository.
License
This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.
Copyright
Copyright (C) 2014 Johannes Baiter johannes.baiter@gmail.com
Copyright (C) 2025 Bib Marsh cui.han@mantle-sound.org
Acknowledgments
Visual Assets: The monk image are derived from the original spreads project. If you are the copyright holder of these assets and have concerns, please contact author at cui.han@mantle-sound.org.
Project page: https://codeberg.org/bmarsh/spreads-lite