Quick Overview
Tired of your Downloads folder looking like a digital junkyard? FileSense uses advanced AI to understand what each file is about and organizes it automatically.
π§ Semantic Understanding
Reads file content and understands meaning using SentenceTransformers embeddings.
β‘ Lightning Fast
FAISS indexing provides fast similarity searches locally on your machine.
ποΈ OCR Enabled
Automatically extracts text from scanned PDFs and image-only documents.
π΅οΈ Fully Offline
Works entirely offlineβnothing leaves your device, complete privacy.
π Real-time Watch
Detects and organizes new files automatically as they appear.
π₯οΈ Easy GUI
Desktop launcher with start/stop controls, logs, and system tray support.
How It Works
FileSense follows a four-step pipeline: extract text, generate embeddings, find best match, move file.
1οΈβ£ Create FAISS Index
Build a semantic search index from your folder descriptions:
2οΈβ£ Process Files in Bulk
Scan and classify all files with optional multithreading:
3οΈβ£ Watch Folder Real-time
Automatically organize new files as they arrive:
4οΈβ£ Launch with GUI
Simple desktop interface with logs and tray control:
Project Structure
FileSense/ βββ scripts/ β βββ create_index.py # Build FAISS index from folder labels β βββ process_file.py # Extract, classify, and move files β βββ script.py # Bulk organizer with threading β βββ watcher_script.py # Real-time folder monitoring β βββ launcher.py # GUI desktop app β βββ multhread.py # Multithreading handler βββ folder_labels.json # Folder names and descriptions βββ folder_embeddings.faiss # (auto-generated) FAISS vector index βββ files/ # Drop unorganized files here
Core Features
| Feature | Description |
|---|---|
| π§ Semantic Sorting | Understands file content using transformer embeddings, not just names. |
| β‘ FAISS Indexing | Builds a fast semantic search index for folder labels. |
| ποΈ OCR Fallback | Extracts text from scanned/image PDFs using pdfplumber + pytesseract. |
| π§© Keyword Boosting | Gives weight bonuses for subject-specific terms. |
| π§΅ Multithreading | Handles multiple files simultaneously for faster processing. |
| π΅οΈ Real-time Watcher | Detects and organizes files automatically as they appear. |
| π₯οΈ GUI Launcher | Desktop interface with controls, logs, and system tray icon. |
| π Offline Privacy | Works entirely offlineβnothing leaves your device. |
Installation & Setup
Requirements
Python 3.8+ with the following libraries:
On Linux, also install tesseract:
Quick Start
- Clone the repo:
git clone https://github.com/ahhyoushh/filesense.git && cd filesense - Edit folder_labels.json with your desired folder names and descriptions
- Create the index:
python scripts/create_index.py - Drop files in /files and run the organizer
- Choose your method: bulk process, real-time watch, or GUI launcher
π‘ Pro Tip: Use the GUI launcher for the easiest experience. It handles everything with a simple interface.
Configuration Options
| Setting | File | Description |
|---|---|---|
--dir / -d | script.py / watcher_script.py | Directory to scan or watch |
--threads / -t | script.py | Maximum concurrent threads (default: 4) |
THRESHOLD | process_file.py | Minimum similarity score to accept match (default: 0.45) |
MODEL_NAME | create_index.py | SentenceTransformer model (default: all-mpnet-base-v2) |
What I Learned
π§ Embeddings & NLP
- Web browsers download process, handling .tmp files, etc.
- SentenceTransformer embeddings enable semantic matching beyond keywords
- Embeddings capture meaning and context in high-dimensional space
- Keyword boosting improves accuracy for domain-specific classification
β‘ Vector Search at Scale
- FAISS provides fast nearest-neighbor searches locally
- Building indexes scales better than brute-force similarity
- Indexing trades memory for speedβperfect for real-time processing
π― OCR & Document Processing
- Combining pdfplumber + pytesseract handles varied PDF formats
- Fallback strategies are critical for messy real-world data
- OCR preprocessing (contrast, rotation) significantly improves accuracy
βοΈ Concurrent Processing
- Multithreading improves throughput for I/O-bound file operations
- Thread pools prevent resource exhaustion and improve stability
- Race conditions require careful synchronization in file operations
π Privacy-First Design
- Offline-first architecture eliminates data transmission risks
- Users value speed and control over cloud convenience
- Local processing builds trust with your users
π¨ UX & Product Design
- Downloading from web consists of many steps rather than file magically appearing
- Non-technical users need simple GUI controls
- Fallback mechanisms (keywords, filename matching) improve reliability
- Real-time feedback and logs reduce user uncertainty
Future Enhancements
- π€ fine tune models to generate folder descriptions for folder names
- π Incremental FAISS updates (no full rebuilds needed)
- πΈ Better image-only classification using vision models
- β©οΈ Undo/recovery feature to relocate moved files
- π·οΈ Auto-renaming using extracted metadata
- π Web dashboard for previews and remote control
- πΎ Embedding caching for faster reprocessing
Ready to Get Started?
Bring order to your digital chaos with FileSense.