← Back
November, 2025

πŸ—‚οΈ FileSense

AI-powered local file organizer that sorts documents by meaning, not just type. Uses semantic embeddings, FAISS indexing, and OCR to intelligently categorize your chaotic files.

Quick Overview

Tired of your Downloads folder looking like a digital junkyard? FileSense uses advanced AI to understand what each file is about and organizes it automatically.

🧠 Semantic Understanding

Reads file content and understands meaning using SentenceTransformers embeddings.

⚑ Lightning Fast

FAISS indexing provides fast similarity searches locally on your machine.

πŸ‘οΈ OCR Enabled

Automatically extracts text from scanned PDFs and image-only documents.

πŸ•΅οΈ Fully Offline

Works entirely offlineβ€”nothing leaves your device, complete privacy.

πŸ”„ Real-time Watch

Detects and organizes new files automatically as they appear.

πŸ–₯️ Easy GUI

Desktop launcher with start/stop controls, logs, and system tray support.

How It Works

FileSense follows a four-step pipeline: extract text, generate embeddings, find best match, move file.

1️⃣ Create FAISS Index

Build a semantic search index from your folder descriptions:

python scripts/create_index.py

2️⃣ Process Files in Bulk

Scan and classify all files with optional multithreading:

python scripts/script.py --dir ./files --threads 8

3️⃣ Watch Folder Real-time

Automatically organize new files as they arrive:

python scripts/watcher_script.py --dir ./files

4️⃣ Launch with GUI

Simple desktop interface with logs and tray control:

python scripts/launcher.py

Project Structure

FileSense/
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ create_index.py      # Build FAISS index from folder labels
β”‚   β”œβ”€β”€ process_file.py      # Extract, classify, and move files
β”‚   β”œβ”€β”€ script.py            # Bulk organizer with threading
β”‚   β”œβ”€β”€ watcher_script.py    # Real-time folder monitoring
β”‚   β”œβ”€β”€ launcher.py          # GUI desktop app
β”‚   └── multhread.py         # Multithreading handler
β”œβ”€β”€ folder_labels.json       # Folder names and descriptions
β”œβ”€β”€ folder_embeddings.faiss  # (auto-generated) FAISS vector index
└── files/                   # Drop unorganized files here

Core Features

Feature Description
🧠 Semantic Sorting Understands file content using transformer embeddings, not just names.
⚑ FAISS Indexing Builds a fast semantic search index for folder labels.
πŸ‘οΈ OCR Fallback Extracts text from scanned/image PDFs using pdfplumber + pytesseract.
🧩 Keyword Boosting Gives weight bonuses for subject-specific terms.
🧡 Multithreading Handles multiple files simultaneously for faster processing.
πŸ•΅οΈ Real-time Watcher Detects and organizes files automatically as they appear.
πŸ–₯️ GUI Launcher Desktop interface with controls, logs, and system tray icon.
πŸ” Offline Privacy Works entirely offlineβ€”nothing leaves your device.

Installation & Setup

Requirements

Python 3.8+ with the following libraries:

pip install sentence-transformers faiss-cpu numpy pdfplumber pytesseract pillow python-docx watchdog pystray

On Linux, also install tesseract:

sudo apt install tesseract-ocr

Quick Start

  1. Clone the repo: git clone https://github.com/ahhyoushh/filesense.git && cd filesense
  2. Edit folder_labels.json with your desired folder names and descriptions
  3. Create the index: python scripts/create_index.py
  4. Drop files in /files and run the organizer
  5. Choose your method: bulk process, real-time watch, or GUI launcher

πŸ’‘ Pro Tip: Use the GUI launcher for the easiest experience. It handles everything with a simple interface.

Configuration Options

SettingFileDescription
--dir / -dscript.py / watcher_script.pyDirectory to scan or watch
--threads / -tscript.pyMaximum concurrent threads (default: 4)
THRESHOLDprocess_file.pyMinimum similarity score to accept match (default: 0.45)
MODEL_NAMEcreate_index.pySentenceTransformer model (default: all-mpnet-base-v2)

What I Learned

🧠 Embeddings & NLP

  • Web browsers download process, handling .tmp files, etc.
  • SentenceTransformer embeddings enable semantic matching beyond keywords
  • Embeddings capture meaning and context in high-dimensional space
  • Keyword boosting improves accuracy for domain-specific classification

⚑ Vector Search at Scale

  • FAISS provides fast nearest-neighbor searches locally
  • Building indexes scales better than brute-force similarity
  • Indexing trades memory for speedβ€”perfect for real-time processing

🎯 OCR & Document Processing

  • Combining pdfplumber + pytesseract handles varied PDF formats
  • Fallback strategies are critical for messy real-world data
  • OCR preprocessing (contrast, rotation) significantly improves accuracy

βš™οΈ Concurrent Processing

  • Multithreading improves throughput for I/O-bound file operations
  • Thread pools prevent resource exhaustion and improve stability
  • Race conditions require careful synchronization in file operations

πŸ” Privacy-First Design

  • Offline-first architecture eliminates data transmission risks
  • Users value speed and control over cloud convenience
  • Local processing builds trust with your users

🎨 UX & Product Design

  • Downloading from web consists of many steps rather than file magically appearing
  • Non-technical users need simple GUI controls
  • Fallback mechanisms (keywords, filename matching) improve reliability
  • Real-time feedback and logs reduce user uncertainty

Future Enhancements

Ready to Get Started?

Bring order to your digital chaos with FileSense.