Getting Started

Getting Started with FileSense

Get FileSense up and running in 5 minutes.

Prerequisites

Before installing FileSense, ensure you have:

Python 3.8+ installed
pip package manager
Google Gemini API Key (Get one here)
(Linux only) Tesseract OCR for scanned documents

Check your Python version:

python --version
# Should show Python 3.8 or higher

Installation

Step 1: Clone the Repository

git clone https://github.com/ahhyoushh/FileSense.git
cd FileSense

Step 2: Install Dependencies

pip install -r requirements.txt

Required packages:

sentence-transformers - SBERT embeddings (BGE-Base v1.5)
faiss-cpu - Vector similarity search
google-genai - Gemini API client
pdfplumber - PDF text extraction
python-docx - DOCX file handling
pytesseract - OCR support
watchdog - File system monitoring
pystray - System tray integration
python-dotenv - Environment variables

Step 3: Install Tesseract OCR (Linux)

Ubuntu/Debian:

sudo apt update
sudo apt install tesseract-ocr

macOS:

brew install tesseract

Windows: Download from GitHub Releases

API Key Setup

Get Your Gemini API Key

Visit Google AI Studio
Sign in with your Google account
Click “Create API Key”
Copy the generated key

Configure Environment

Create a .env file in the project root:

# .env
API_KEY=your_gemini_api_key_here

Security tip: Never commit .env to version control!

Initialize FileSense

Create the Initial Index

Even with no labels, you need to create the FAISS index:

python scripts/create_index.py

Expected output:

[!] No folder labels found in the JSON file. Cannot create index.

This is normal for first-time setup. The index will be created automatically when you process your first file.

Verify Installation

Test that everything is working:

# Check if all imports work
python -c "import sentence_transformers, faiss, google.genai; print('Success: All dependencies installed')"

First Run

Option A: Windows Launcher (Recommended)

Double-click the FileSense_Launcher.bat file in the project folder.

Alternatively, run it from command line:

FileSense_Launcher.bat

Features:

One-click startup (No manual environment activation needed)
Clean visual dashboard
Real-time logs
System tray integration

Option B: Command Line

Process files from the command line:

# Basic usage
python scripts/script.py --dir ./files

# With custom settings
python scripts/script.py --dir ./files --threads 8 --no-generation

CLI Options:

Flag	Description	Default
`--dir`	Directory to organize	`./files`
`--threads`	Number of concurrent threads	`6`
`--single-thread`	Disable multithreading	`False`
`--no-generation`	Don’t generate new labels	`False`
`--train`	Enable training mode	`False`
`--auto-save-logs`	Auto-save logs	`False`
`--no-logs`	Disable logging	`False`

Option C: File Watcher

Monitor a directory and auto-sort new files:

python scripts/watcher_script.py --dir ./Downloads

Perfect for organizing downloads in real-time!

Directory Structure

After installation, your project should look like this:

FileSense/
├── .env                          # API key
├── folder_labels.json            # Label database (auto-created)
├── folder_embeddings.faiss       # Vector index (auto-created)
├── scripts/
│   ├── RL/                       # Reinforcement Learning Module
│   │   ├── rl_policy.py          # RL Agent Logic
│   │   ├── rl_feedback.py        # Reward Mechanism
│   │   ├── rl_config.py          # Configuration
│   │   ├── rl_supabase.py        # Cloud Logging
│   │   └── rl_audit_safe.py      # Safety Checks
│   ├── logger/                   # System Logging
│   │   ├── logger.py             # Main Logger
│   │   └── rl_logger.py          # RL Logger
│   ├── classify_process_file.py  # Classification Logic
│   ├── generate_label.py         # Gemini Integration
│   ├── create_index.py           # Index Builder
│   ├── extract_text.py           # Text Extraction (OCR)
│   ├── multhread.py              # Parallel Processing
│   ├── launcher.py               # GUI App
│   ├── script.py                 # CLI Runner
│   └── watcher_script.py         # Folder Watcher
├── evaluation/                   # Metrics, Logs & JSONs
├── files/                        # Input directory
├── sorted/                       # Output directory
└── logs/                         # Execution Logs

Test with Sample Files

Create Test Files

mkdir -p files
cd files

# Create sample files
echo "Newton's laws of motion describe force and acceleration" > physics_test.txt
echo "The mitochondria is the powerhouse of the cell" > biology_test.txt
echo "Calculate the derivative of x^2 using the power rule" > math_test.txt

Run Classification

cd ..
python scripts/script.py --dir ./files

What happens:

FileSense extracts text from each file
Generates embeddings using SBERT
Asks Gemini to create labels (first run)
Builds FAISS index
Classifies and moves files to sorted/

Next Steps

Congratulations! FileSense is now installed.

Learn More:

FAQ - Common questions and troubleshooting

Advanced Topics:

Architecture - How FileSense works internally
API Reference - Function documentation
Performance Metrics - Benchmarks and optimization

Troubleshooting

Common Issues

Import Error: No module named ‘sentence_transformers’

pip install sentence-transformers

FAISS installation fails

# Try CPU version
pip install faiss-cpu

# Or GPU version (if you have CUDA)
pip install faiss-gpu

Tesseract not found (Windows)

Install Tesseract from GitHub
Add to PATH: C:\Program Files\Tesseract-OCR

API Key not working

Check .env file exists in project root
Verify API key is valid at Google AI Studio
Ensure no extra spaces in .env file

Additional Resources

GitHub Repository: ahhyoushh/FileSense
Demo Video: YouTube
Project Website: ahhyoushh.github.io/FileSense

← Back to Home

Next: FAQ →