FileSense Documentation

Getting Started

Getting Started with FileSense

Get FileSense up and running in 5 minutes.


Prerequisites

Before installing FileSense, ensure you have:

Check your Python version:

python --version
# Should show Python 3.8 or higher

Installation

Step 1: Clone the Repository

git clone https://github.com/ahhyoushh/FileSense.git
cd FileSense

Step 2: Install Dependencies

pip install -r requirements.txt

Required packages:

Step 3: Install Tesseract OCR (Linux)

Ubuntu/Debian:

sudo apt update
sudo apt install tesseract-ocr

macOS:

brew install tesseract

Windows: Download from GitHub Releases


API Key Setup

Get Your Gemini API Key

  1. Visit Google AI Studio
  2. Sign in with your Google account
  3. Click “Create API Key”
  4. Copy the generated key

Configure Environment

Create a .env file in the project root:

# .env
API_KEY=your_gemini_api_key_here

Security tip: Never commit .env to version control!


Initialize FileSense

Create the Initial Index

Even with no labels, you need to create the FAISS index:

python scripts/create_index.py

Expected output:

[!] No folder labels found in the JSON file. Cannot create index.

This is normal for first-time setup. The index will be created automatically when you process your first file.


Verify Installation

Test that everything is working:

# Check if all imports work
python -c "import sentence_transformers, faiss, google.genai; print('Success: All dependencies installed')"

First Run

Double-click the FileSense_Launcher.bat file in the project folder.

Alternatively, run it from command line:

FileSense_Launcher.bat

Features:

Option B: Command Line

Process files from the command line:

# Basic usage
python scripts/script.py --dir ./files

# With custom settings
python scripts/script.py --dir ./files --threads 8 --no-generation

CLI Options:

Flag Description Default
--dir Directory to organize ./files
--threads Number of concurrent threads 6
--single-thread Disable multithreading False
--no-generation Don’t generate new labels False
--train Enable training mode False
--auto-save-logs Auto-save logs False
--no-logs Disable logging False

Option C: File Watcher

Monitor a directory and auto-sort new files:

python scripts/watcher_script.py --dir ./Downloads

Perfect for organizing downloads in real-time!


Directory Structure

After installation, your project should look like this:

FileSense/
├── .env                          # API key
├── folder_labels.json            # Label database (auto-created)
├── folder_embeddings.faiss       # Vector index (auto-created)
├── scripts/
│   ├── RL/                       # Reinforcement Learning Module
│   │   ├── rl_policy.py          # RL Agent Logic
│   │   ├── rl_feedback.py        # Reward Mechanism
│   │   ├── rl_config.py          # Configuration
│   │   ├── rl_supabase.py        # Cloud Logging
│   │   └── rl_audit_safe.py      # Safety Checks
│   ├── logger/                   # System Logging
│   │   ├── logger.py             # Main Logger
│   │   └── rl_logger.py          # RL Logger
│   ├── classify_process_file.py  # Classification Logic
│   ├── generate_label.py         # Gemini Integration
│   ├── create_index.py           # Index Builder
│   ├── extract_text.py           # Text Extraction (OCR)
│   ├── multhread.py              # Parallel Processing
│   ├── launcher.py               # GUI App
│   ├── script.py                 # CLI Runner
│   └── watcher_script.py         # Folder Watcher
├── evaluation/                   # Metrics, Logs & JSONs
├── files/                        # Input directory
├── sorted/                       # Output directory
└── logs/                         # Execution Logs

Test with Sample Files

Create Test Files

mkdir -p files
cd files

# Create sample files
echo "Newton's laws of motion describe force and acceleration" > physics_test.txt
echo "The mitochondria is the powerhouse of the cell" > biology_test.txt
echo "Calculate the derivative of x^2 using the power rule" > math_test.txt

Run Classification

cd ..
python scripts/script.py --dir ./files

What happens:

  1. FileSense extracts text from each file
  2. Generates embeddings using SBERT
  3. Asks Gemini to create labels (first run)
  4. Builds FAISS index
  5. Classifies and moves files to sorted/

Next Steps

Congratulations! FileSense is now installed.

Learn More:

Advanced Topics:


Troubleshooting

Common Issues

Import Error: No module named ‘sentence_transformers’

pip install sentence-transformers

FAISS installation fails

# Try CPU version
pip install faiss-cpu

# Or GPU version (if you have CUDA)
pip install faiss-gpu

Tesseract not found (Windows)

API Key not working


Additional Resources


← Back to Home Next: FAQ →