🗂️ FileSense

Quick Overview

FileSense is a Self-Organizing file organizer that sorts documents by meaning, not just filenames. Unlike standard organizers, it uses SentenceTransformers and FAISS to understand context.

🤖 Generative Labeling

Uses Google Gemini to analyze unknown files and auto-create specific folder categories.

🟣 Reinforcement Learning

Epsilon-Greedy Bandit agent learns the optimal policy to balance speed vs accuracy.

🧠 Semantic Search

Vector embeddings understand "Newton" belongs in "Physics" without explicit rules.

⚡ Live Indexing

FAISS index rebuilds dynamically when new labels are generated.

🚀 Quick Start:

Double-click FileSense_Launcher.bat to start the app instantly without command line!

Make sense of this? Read the Wiki 📖 📽️ Video

Demo Video

A short walkthrough demo.

How It Works

The system follows a decision pipeline: Identify → RL Agent Decisions → Search OR Generate → Move.

1️⃣ Semantic Classification

Files are read (via text or OCR), encoded into vectors, and compared against the folder_embeddings.faiss index. High similarity matches are instantly sorted.

2️⃣ The RL Agent (Epsilon-Greedy)

Before calling any API, the Epsilon-Greedy Agent evaluates the state. It decides whether to:

Exploit: Use the safest/cheapest known method (Vector Search).
Explore: Attempt to find a better label using GenAI (if permitted by policy).

3️⃣ Generative Fallback

If the Agent permits, low-confidence files are sent to Google Gemini to:

Generate a broad Category Label.
Create description keywords.
Update folder_labels.json and rebuild the index.

Project Structure

FileSense/
├── scripts/
│   ├── RL/                       # Reinforcement Learning
│   │   ├── rl_policy.py          # Epsilon-Greedy Agent
│   │   ├── rl_feedback.py        # Reward System
│   │   ├── rl_config.py          # Hyperparameters
│   │   └── rl_supabase.py        # Cloud Logs
│   ├── logger/                   # Logging System
│   │   ├── logger.py             # Main Logger
│   │   └── rl_logger.py          # RL Logger
│   ├── classify_process_file.py  # Core Logic
│   ├── generate_label.py         # Gemini Interface
│   ├── create_index.py           # FAISS Indexer
│   ├── extract_text.py           # OCR Engine
│   └── launcher.py               # GUI App
├── evaluation/                   # Metrics
├── landing/                      # Website
└── wiki/                         # Documentation

Core Features

Semantic Sorting: Classifies documents by meaning using SentenceTransformers.
RL-Optimized: Adapts to user files over time to minimize expensive API calls.
AI-Powered Labeling: Creates new categories for unknown files automatically.

Want to see the data?

I've documented every benchmark, failure, and architecture decision in the Wiki. Check out the Metrics, RL Analysis, and NL vs Keywords study.

📚 Explore the Wiki