Getting Started with FileSense
Get FileSense up and running in 5 minutes.
Prerequisites
Before installing FileSense, ensure you have:
- Python 3.8+ installed
- pip package manager
- Google Gemini API Key (Get one here)
- (Linux only) Tesseract OCR for scanned documents
Check your Python version:
python --version
# Should show Python 3.8 or higher
Installation
Step 1: Clone the Repository
git clone https://github.com/ahhyoushh/FileSense.git
cd FileSense
Step 2: Install Dependencies
pip install -r requirements.txt
Required packages:
sentence-transformers - SBERT embeddings (BGE-Base v1.5)
faiss-cpu - Vector similarity search
google-genai - Gemini API client
pdfplumber - PDF text extraction
python-docx - DOCX file handling
pytesseract - OCR support
watchdog - File system monitoring
pystray - System tray integration
python-dotenv - Environment variables
Step 3: Install Tesseract OCR (Linux)
Ubuntu/Debian:
sudo apt update
sudo apt install tesseract-ocr
macOS:
Windows:
Download from GitHub Releases
API Key Setup
Get Your Gemini API Key
- Visit Google AI Studio
- Sign in with your Google account
- Click “Create API Key”
- Copy the generated key
Create a .env file in the project root:
# .env
API_KEY=your_gemini_api_key_here
Security tip: Never commit .env to version control!
Initialize FileSense
Create the Initial Index
Even with no labels, you need to create the FAISS index:
python scripts/create_index.py
Expected output:
[!] No folder labels found in the JSON file. Cannot create index.
This is normal for first-time setup. The index will be created automatically when you process your first file.
Verify Installation
Test that everything is working:
# Check if all imports work
python -c "import sentence_transformers, faiss, google.genai; print('Success: All dependencies installed')"
First Run
Option A: Windows Launcher (Recommended)
Double-click the FileSense_Launcher.bat file in the project folder.
Alternatively, run it from command line:
FileSense_Launcher.bat
Features:
- One-click startup (No manual environment activation needed)
- Clean visual dashboard
- Real-time logs
- System tray integration
Option B: Command Line
Process files from the command line:
# Basic usage
python scripts/script.py --dir ./files
# With custom settings
python scripts/script.py --dir ./files --threads 8 --no-generation
CLI Options:
| Flag |
Description |
Default |
--dir |
Directory to organize |
./files |
--threads |
Number of concurrent threads |
6 |
--single-thread |
Disable multithreading |
False |
--no-generation |
Don’t generate new labels |
False |
--train |
Enable training mode |
False |
--auto-save-logs |
Auto-save logs |
False |
--no-logs |
Disable logging |
False |
Option C: File Watcher
Monitor a directory and auto-sort new files:
python scripts/watcher_script.py --dir ./Downloads
Perfect for organizing downloads in real-time!
Directory Structure
After installation, your project should look like this:
FileSense/
├── .env # API key
├── folder_labels.json # Label database (auto-created)
├── folder_embeddings.faiss # Vector index (auto-created)
├── scripts/
│ ├── RL/ # Reinforcement Learning Module
│ │ ├── rl_policy.py # RL Agent Logic
│ │ ├── rl_feedback.py # Reward Mechanism
│ │ ├── rl_config.py # Configuration
│ │ ├── rl_supabase.py # Cloud Logging
│ │ └── rl_audit_safe.py # Safety Checks
│ ├── logger/ # System Logging
│ │ ├── logger.py # Main Logger
│ │ └── rl_logger.py # RL Logger
│ ├── classify_process_file.py # Classification Logic
│ ├── generate_label.py # Gemini Integration
│ ├── create_index.py # Index Builder
│ ├── extract_text.py # Text Extraction (OCR)
│ ├── multhread.py # Parallel Processing
│ ├── launcher.py # GUI App
│ ├── script.py # CLI Runner
│ └── watcher_script.py # Folder Watcher
├── evaluation/ # Metrics, Logs & JSONs
├── files/ # Input directory
├── sorted/ # Output directory
└── logs/ # Execution Logs
Test with Sample Files
Create Test Files
mkdir -p files
cd files
# Create sample files
echo "Newton's laws of motion describe force and acceleration" > physics_test.txt
echo "The mitochondria is the powerhouse of the cell" > biology_test.txt
echo "Calculate the derivative of x^2 using the power rule" > math_test.txt
Run Classification
cd ..
python scripts/script.py --dir ./files
What happens:
- FileSense extracts text from each file
- Generates embeddings using SBERT
- Asks Gemini to create labels (first run)
- Builds FAISS index
- Classifies and moves files to
sorted/
Next Steps
Congratulations! FileSense is now installed.
Learn More:
- FAQ - Common questions and troubleshooting
Advanced Topics:
Troubleshooting
Common Issues
Import Error: No module named ‘sentence_transformers’
pip install sentence-transformers
FAISS installation fails
# Try CPU version
pip install faiss-cpu
# Or GPU version (if you have CUDA)
pip install faiss-gpu
Tesseract not found (Windows)
- Install Tesseract from GitHub
- Add to PATH:
C:\Program Files\Tesseract-OCR
API Key not working
- Check
.env file exists in project root
- Verify API key is valid at Google AI Studio
- Ensure no extra spaces in
.env file
Additional Resources