Implementing a Q-Learning Algorithm with Logistic Regression Normalization in C++

codingd · Jun 4, 2025

Hello everyone!

I wanted to share a project I've been working on that implements a Q-learning algorithm combined with a logistic regression normalization module, all done in C++. The code reads a graph and entity data from text files, processes them, and uses them to learn optimal actions based on the relationships defined in the graph.
Overview

The code is structured to perform the following tasks:

Graph and Entity Loading: It reads a graph from graph.txt, where each line defines a relationship between entities (source, relation, target). It also loads entities from entities.txt, which includes an ID and a set of three feature values.

Vector Representation: Each entity's features are stored as a 3-dimensional vector, allowing for mathematical operations and comparisons.

Normalization Module: The normalization is based on a regularized logistic regression optimized using the RMSProp algorithm. This helps in scaling the feature vectors appropriately.

Q-Learning Integration: The Q-learning algorithm is implemented to learn the best actions to take in various states, with a focus on maximizing the expected rewards.

Distance Calculation: The Canberra distance is used to measure similarity between vectors, which plays a crucial role in determining the rewards for actions taken.

No External Libraries: The implementation relies solely on the C++ Standard Template Library (STL), making it lightweight and easy to understand.

Key Features

Data Structures: The code defines structures for edges and entities, making it easy to manage relationships and features.
Gradient Computation: It includes functions to compute gradients for the logistic regression, which are essential for the optimization process.
Action Selection: The Q-learning part of the code selects actions based on the learned Q-values, allowing the agent to navigate through the graph effectively.
Evaluation: After training, the code evaluates the learned values for each state, providing insights into the agent's performance.

Example Files

Graph File (graph.txt):

Alice friend Bob

Bob friend Charlie

Alice colleague David

Charlie neighbor Eve

David mentor Frank

Eve coworker Alice

Entities File (entities.txt):

Alice 0.10 0.20 0.30

Bob 0.40 0.50 0.60

Charlie 0.70 0.80 0.90

David 0.20 0.30 0.40

Eve 0.50 0.60 0.70

Frank 0.80 0.90 1.00

dummy1 0.00 0.00 0.00

dummy2 0.10 0.10 0.10

Conclusion

I hope this project can serve as a useful reference for anyone interested in reinforcement learning and logistic regression. I welcome any feedback, suggestions, or questions you might have!

Feel free to check out the code below and let me know what you think!

C++:

/*
MIT License
Copyright (c) 2025 CoTon_TiGe_MoUaRf

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <map>
#include <string>
#include <cmath>
#include <cassert>
#include <limits>
#include <cstdlib>
#include <ctime>

// ----- Définition des structures de données pour le graphe et les entités ----- //

struct Edge {
    std::string source;
    std::string relation;
    std::string target;
};

struct Entity {
    std::string id;
    std::vector<double> features;  // dimension 3
};

// Global data for graph and entities
std::vector<Edge> graphEdges;                 // toutes les arêtes du graphe
std::map<std::string, Entity> entitiesMap;    // id -> Entity

// Pour simplifier l’accès aux entités de type "dummy", on suppose que tous les nodes existent dans entitiesMap.

// ----- Fonctions utilitaires de lecture des fichiers ----- //

void loadGraph(const std::string& filename) {
    std::ifstream fin(filename.c_str());
    if (!fin) {
        std::cerr << "Erreur d'ouverture de " << filename << "\n";
        exit(1);
    }
    std::string line;
    while (getline(fin, line)) {
        if (line.empty()) continue;
        std::istringstream iss(line);
        Edge e;
        iss >> e.source >> e.relation >> e.target;
        graphEdges.push_back(e);
    }
    fin.close();
}

void loadEntities(const std::string& filename) {
    std::ifstream fin(filename.c_str());
    if (!fin) {
        std::cerr << "Erreur d'ouverture de " << filename << "\n";
        exit(1);
    }
    std::string line;
    while (getline(fin, line)) {
        if (line.empty()) continue;
        std::istringstream iss(line);
        Entity ent;
        iss >> ent.id;
        double tmp;
        while (iss >> tmp) {
            ent.features.push_back(tmp);
        }
        // Vérifier la dimension attendue (ici 3)
        if (ent.features.size() != 3) {
            std::cerr << "La dimension de " << ent.id << " n'est pas égale à 3.\n";
            exit(1);
        }
        entitiesMap[ent.id] = ent;
    }
    fin.close();
}

// ----- Fonctions mathématiques et de vectorisation ----- //

// Fonction dot product de deux vecteurs
double dot(const std::vector<double>& a, const std::vector<double>& b) {
    assert(a.size() == b.size());
    double result = 0.0;
    for (size_t i = 0; i < a.size(); i++)
        result += a[i] * b[i];
    return result;
}

// Fonction de multiplication scalaire
std::vector<double> scalarMult(const std::vector<double>& v, double a) {
    std::vector<double> result(v.size());
    for (size_t i = 0; i < v.size(); i++)
        result[i] = v[i] * a;
    return result;
}

// Fonction d’addition de deux vecteurs
std::vector<double> vectorAdd(const std::vector<double>& a, const std::vector<double>& b) {
    assert(a.size() == b.size());
    std::vector<double> result(a.size());
    for (size_t i = 0; i < a.size(); i++)
        result[i] = a[i] + b[i];
    return result;
}

// Fonction de soustraction
std::vector<double> vectorSub(const std::vector<double>& a, const std::vector<double>& b) {
    assert(a.size() == b.size());
    std::vector<double> result(a.size());
    for (size_t i = 0; i < a.size(); i++)
        result[i] = a[i] - b[i];
    return result;
}

// Fonction d’application de sqrt élément par élément
std::vector<double> vectorSqrt(const std::vector<double>& v) {
    std::vector<double> result(v.size());
    for (size_t i = 0; i < v.size(); i++)
        result[i] = std::sqrt(v[i]);
    return result;
}

// ----- Définition des paramètres globaux pour la Régression Logistique et RMSProp ----- //

struct RegressionConfig {
    double alpha;       // taux d'apprentissage
    double beta;        // taux de décroissance du cache RMSProp
    double epsilon;     // petit terme pour stabilité numérique
    double alpha_reg;   // coefficient de régularisation L2
    int iterations;     // nombre d'itérations
};

RegressionConfig regConf = {0.01, 0.9, 1e-8, 0.001, 1000};

// On travaille dans l'espace de dimension 3 (pour correspondre à nos entités)
const int dim = 3;

// ----- Module de Normalisation par Régression Logistique Régularisée avec RMSProp ----- //

/*
Nous considérons une régression logistique simple :
p(y=1|x; theta) = 1/(1+exp(-theta^Tx))
L'objectif est de maximiser J(theta) = somme{ log(p(y|x;theta)) } - alpha_regR(theta)
Ici, nous supposons que S1 (ensemble d'entraînement) est fourni en interne.
Pour simplifier, nous génèrons quelques exemples artificiels à partir des entités.
*/

// Génère un ensemble d’exemples à partir des entités (S1)
// On va utiliser pour chaque entité son vecteur et une étiquette artificielle déterminée par une simple règle (ex : somme(features) > seuil)
// Ceci est uniquement pour illustrer l’optimisation du modèle de normalisation.
struct Example {
    std::vector<double> x; // vecteur d'entrée
    int y;                 // label 0 ou 1
};

std::vector<Example> generateExamples() {
    std::vector<Example> examples;
    for (auto &kv : entitiesMap) {
        // On ignore les entités dummy si leur id commence par "dummy"
        if (kv.first.find("dummy") != std::string::npos)
            continue;
        Example ex;
        ex.x = kv.second.features;
        double sum = 0.0;
        for (auto val : ex.x) sum += val;
        // règle simple d'étiquetage : 1 si somme > 1.0, sinon 0
        ex.y = (sum > 1.0) ? 1 : 0;
        examples.push_back(ex);
    }
    return examples;
}

// Fonction sigmoïde
double sigmoid(double z) {
    return 1.0 / (1.0 + std::exp(-z));
}

// Fonction qui calcule la prédiction : p(y=1|x; theta)
double predict(const std::vector<double>& theta, const std::vector<double>& x) {
    double z = dot(theta, x);
    return sigmoid(z);
}

// Fonction qui calcule le gradient de la log-vraisemblance (avec régularisation L2)
// Pour un exemple: grad = (y - prediction)*x - 2 * alpha_reg * theta
std::vector<double> computeGradient(const std::vector<double>& theta, const Example& ex) {
    std::vector<double> grad(dim, 0.0);
    double prediction = predict(theta, ex.x);
    double error = ex.y - prediction;  // gradient de log-vraisemblance
    for (int i = 0; i < dim; i++) {
        // Note : on inverse le signe pour maximiser la vraisemblance,
        // ou l'on dit que nous descendons sur le négatif de la log-vraisemblance.
        grad[i] = error * ex.x[i] - 2 * regConf.alpha_reg * theta[i];
    }
    return grad;
}

// Procédure NORMALISATION_RMSPROP : entrainement de theta selon S1 et S2 (ici S2 sera évalué très simplement)
std::vector<double> NORMALISATION_RMSPROP(const std::vector<Example>& S1, const std::vector<Example>& /*S2*/) {
    // Initialiser theta de dimension dim aléatoirement et cache v
    std::vector<double> theta(dim, 0.0);
    std::vector<double> v(dim, 0.0);
    // Initialisation aléatoire
    srand(time(0));
    for (int i = 0; i < dim; i++) {
        theta[i] = ((double)rand() / (double)RAND_MAX - 0.5) * 0.1; // petit aléatoire
        v[i] = 0.0;
    }

    // Itérations de formation sur S1
    for (int iter = 0; iter < regConf.iterations; iter++) {
        // Mise à jour pour chaque exemple
        for (size_t idx = 0; idx < S1.size(); idx++) {
            std::vector<double> grad = computeGradient(theta, S1[idx]);
            // Mise à jour RMSProp pour chaque paramètre
            for (int i = 0; i < dim; i++) {
                // Update du cache RMSProp
                v[i] = regConf.beta * v[i] + (1 - regConf.beta) * grad[i] * grad[i];
                // Mise à jour de theta (descente de gradient)
                theta[i] = theta[i] + regConf.alpha / (std::sqrt(v[i]) + regConf.epsilon) * grad[i];
            }
        }
        // (Optionnel) On pourrait évaluer la fonction de coût sur un ensemble de validation S2.
    }
    return theta;
}

// Fonction de transformation (normalisation) : NORM(z) = Transformation(z; theta_optimal)
// Ici, pour la normalisation, nous utilisons une régression logistique qui prend le vecteur brut et renvoie une valeur dans [0,1],
// on définit la transformation comme le vecteur original multiplié par le coefficient de normalisation (ex: un simple scaling).
std::vector<double> NORM(const std::vector<double>& z, const std::vector<double>& theta_optimal) {
    // Pour chaque composant, on peut multipler z[i] par sigmoid(theta_optimal[i])
    std::vector<double> result(z.size());
    for (size_t i = 0; i < z.size(); i++) {
        result[i] = z[i] * sigmoid(theta_optimal[i]);
    }
    return result;
}

// ----- Modules VECTORIEL : TF, CanberraDistance, extraction de score ----- //

// TF(t, d) : fréquence terme t dans le document d (ici simplifié sur un vecteur)
double TF(double term, const std::vector<double>& d) {
    // Ici on considère term comme une valeur attendue et compare aux valeurs du vecteur d
    int count = 0;
    for (size_t i = 0; i < d.size(); i++) {
        if (fabs(d[i] - term) < 1e-6) count++;
    }
    return double(count) / double(d.size());
}

// Distance Canberra : calcule la distance entre deux vecteurs
double CanberraDistance(const std::vector<double>& X, const std::vector<double>& Y) {
    assert(X.size() == Y.size());
    double distance = 0.0;
    for (size_t i = 0; i < X.size(); i++) {
        double num = fabs(X[i] - Y[i]);
        double den = fabs(X[i]) + fabs(Y[i]);
        if (den != 0)
            distance += num / den;
        else
            distance += 0;
    }
    return distance;
}

// Score(s, t) : convertit la distance en score; plus la distance est faible, meilleur est le score
double Score(const std::vector<double>& xs, const std::vector<double>& xt) {
    double score_distance = CanberraDistance(xs, xt);
    double score = 1.0 / (1.0 + score_distance);
    return score;
}

// ----- QLearning et navigation dans le graphe ----- //

// Pour la navigation dans le graphe, on définit une fonction qui renvoie
// les actions possibles (pour un état donné, ici identifié par le nom de l'entité)
// On parcourt graphEdges et on renvoie les paires (relation, target) pour lesquelles la source correspond.
std::vector<std::pair<std::string, std::string>> obtenir_actions(const std::string& etat) {
    std::vector<std::pair<std::string, std::string>> actions;
    for (size_t i = 0; i < graphEdges.size(); i++) {
        if (graphEdges[i].source == etat)
            actions.push_back(std::make_pair(graphEdges[i].relation, graphEdges[i].target));
    }
    return actions;
}

// Fonction de transition δ(s, a) : pour un état s et une action a (représentée ici par la cible de la relation),
// retourne le vecteur associé à la cible.
std::vector<double> delta(const std::string& s, const std::pair<std::string, std::string>& action) {
    // Ici action.second est la cible.
    // On retourne le vecteur correspondant (brut)
    if (entitiesMap.find(action.second) != entitiesMap.end()) {
        return entitiesMap[action.second].features;
    } else {
        // Si non trouvé, retourner un vecteur nul
        return std::vector<double>(dim, 0.0);
    }
}

// Pour la fonction f(score) : transformation de la similarité en récompense immédiate.
// Ici, nous simplifions: on retourne score * 10.
double f_score(double score) {
    return score * 10.0;
}

// ----- Structures pour QLearning ----- //

struct StateAction {
    std::string state;   // représenté par l'identifiant de l'entité
    std::string action;  // que l'on définit ici par la relation choisie (l'action conduit à une cible)
};

struct QLearningConfig {
    double gamma;    // facteur d'actualisation
    int nbEpisodes;  // nombre d'épisodes d'entraînement
};

QLearningConfig qlConf = {0.9, 100};

// Pour Q(s,a) et V(s), nous utilisons des maps indexées par (state, action) ou state.
std::map<std::string, double> V;     // valeur d'un état (indexé par id de l'entité)
std::map<std::string, std::map<std::string, double>> Q; // Q[state][action]

// Mise à jour de Q-learning lors du passage dans un état non terminal.
void QLearningEpisode(const std::vector<double>& theta_optimal) {
    // Choix aléatoire d'un exemple initial parmi les entités (sauf dummy)
    std::vector<std::string> entity_ids;
    for (auto &kv : entitiesMap) {
        if (kv.first.find("dummy") == std::string::npos)
            entity_ids.push_back(kv.first);
    }
    if (entity_ids.empty()) return;
    int idx = rand() % entity_ids.size();
    std::string s_current = entity_ids[idx];

    // Normaliser l'état s : d'abord vecteur brut
    std::vector<double> s_brut = entitiesMap[s_current].features;
    std::vector<double> s_norm = NORM(s_brut, theta_optimal);

    bool terminal = false;
    int stepCounter = 0;
    while (!terminal && stepCounter < 10) { // on impose une limite de pas pour éviter une boucle infinie
        // Obtenir les actions possibles dans le graphe pour l'état s_current (identifié par le nom)
        std::vector<std::pair<std::string, std::string>> actions_possible = obtenir_actions(s_current);
        if (actions_possible.empty()) {
            break; // état terminal, pas d'action possible
        }
        // Pour chaque action, calculer Q(s,a)
        for (size_t i = 0; i < actions_possible.size(); i++) {
            std::string a = actions_possible[i].first;
            // Transition : obtenir target
            std::vector<double> t_brut = delta(s_current, actions_possible[i]);
            std::vector<double> t_norm = NORM(t_brut, theta_optimal);
            double score = Score(s_norm, t_norm);
            double r_immed = f_score(score);
            // On récupère V(t) : on utilise l'id de la cible comme identifiant d'état t
            double v_t = 0.0;
            std::string id_t = actions_possible[i].second;
            if (V.find(id_t) != V.end()) {
                v_t = V[id_t];
            }
            Q[s_current][a] = r_immed + qlConf.gamma * v_t;
        }
        // Sélection de l'action a* avec max Q
        double bestQ = -std::numeric_limits<double>::infinity();
        std::string bestAction;
        std::string bestTarget;
        for (size_t i = 0; i < actions_possible.size(); i++) {
            std::string a = actions_possible[i].first;
            if (Q[s_current][a] > bestQ) {
                bestQ = Q[s_current][a];
                bestAction = a;
                bestTarget = actions_possible[i].second;
            }
        }
        // Mise à jour de V(s)
        V[s_current] = bestQ;
        // Transition : s <- δ(s, a*)
        s_current = bestTarget;
        // Actualiser s_norm avec le nouveau vecteur
        if (entitiesMap.find(s_current) != entitiesMap.end())
            s_norm = NORM(entitiesMap[s_current].features, theta_optimal);
        else
            s_norm = std::vector<double>(dim, 0.0);
        // Condition d'arrêt optionnelle : si on atteint une entité sans action
        if (obtenir_actions(s_current).empty()) {
            terminal = true;
        }
        stepCounter++;
    }
}

// Extraction de la politique optimale : pour chaque état, choisir l'action avec le max Q
std::map<std::string, std::string> extractionPolitiqueOptimale() {
    std::map<std::string, std::string> policy;
    for (auto &entry : Q) {
        std::string state = entry.first;
        double bestQ = -std::numeric_limits<double>::infinity();
        std::string bestAction;
        for (auto &act : entry.second) {
            if (act.second > bestQ) {
                bestQ = act.second;
                bestAction = act.first;
            }
        }
        policy[state] = bestAction;
    }
    return policy;
}

// ----- (Optionnel) Entraînement Compositionnel : ici illustré par une fonction vide ----- //
// ----- (Optionnel) Entraînement Compositionnel : mise à jour des représentations vectorielles ----- //
void EntrainementCompositionnel(const std::vector<Example>& ensemble_exemples) {
    double learning_rate = 0.01; // Taux d'apprentissage
    double margin = 1.0;          // Marge pour la hinge loss

    for (const auto& ex : ensemble_exemples) {
        // Retrouver l'entité correspondante en cherchant par vecteur features
        std::string entity_id = "";
        for (const auto& kv : entitiesMap) {
            if (kv.second.features == ex.x) {
                entity_id = kv.first;
                break;
            }
        }
        if (entity_id == "") {
            // Entité non trouvée, on passe
            continue;
        }

        Entity& entity = entitiesMap[entity_id];
        std::vector<double>& features = entity.features; // référence vers les features

        // Calculer la prédiction avec produit scalaire
        double prediction = dot(features, ex.x);

        // Mise à jour hinge loss
        if (ex.y == 1 && prediction < margin) {
            for (size_t i = 0; i < features.size(); ++i) {
                features[i] += learning_rate * (1.0 - prediction);
            }
        } else if (ex.y == 0 && prediction > -margin) {
            for (size_t i = 0; i < features.size(); ++i) {
                features[i] -= learning_rate * (prediction + margin);
            }
        }

        // La modification est directement sur entity.features via référence
    }
}



// ----- Fonction d'Évaluation Globale (très simplifiée) ----- //
void Evaluation() {
    std::cout << "Evaluation sur l'ensemble des entités (simulée) : \n";
    // Par exemple, nous pourrions afficher le V(s) pour chaque état et un hit@10 simulé, etc.
    for (auto &kv : V)
        std::cout << "Etat " << kv.first << " : V = " << kv.second << "\n";
}

// ----- Fonction principale "Main" ----- //
int main() {
    srand(time(0));
    // 1. Chargement/initialisation du graphe et des ensembles
    loadGraph("graph.txt");
    loadEntities("entities.txt");

    // 2. Génération des exemples (S1)
    std::vector<Example> exemples = generateExamples();
    // Pour S2 (validation), on peut utiliser exactement les mêmes exemples pour cette illustration
    std::vector<Example> exemplesValidation = exemples;

    // 3. Phase de normalisation : apprentissage de theta_optimal avec RMSProp
    std::vector<double> theta_optimal = NORMALISATION_RMSPROP(exemples, exemplesValidation);

    std::cout << "Theta optimal obtenu par RMSProp :\n";
    for (size_t i = 0; i < theta_optimal.size(); i++) {
        std::cout << theta_optimal[i] << " ";
    }
    std::cout << "\n\n";

    // 4. Apprentissage par QLearning avec normalisation
    for (int episode = 0; episode < qlConf.nbEpisodes; episode++) {
        QLearningEpisode(theta_optimal);
    }

    // 5. Extraction de la politique optimale
    std::map<std::string, std::string> policy = extractionPolitiqueOptimale();
    std::cout << "Politique optimale extraite (état -> action):\n";
    for (auto &p : policy)
        std::cout << p.first << " -> " << p.second << "\n";

    // 6. (Optionnel) Entraînement compositionnel
    EntrainementCompositionnel(exemples);

    // 7. Phase d'évaluation
    Evaluation();

    return 0;
}

Thanks for reading!

Chatbot	0	Oct 8, 2024
Database Manager: A C++ Console Application	14	May 12, 2025
Building a Professional Neural Network Framework: Full C++ Implementation with Windows GUI and Real-Time Training Visualization	11	Jun 14, 2026
Filter sober in c++ don't pass test	0	Dec 2, 2023
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Universal BMP Steganography Tool (AES-128-CTR + SP800-90A CSPRNG) Full Encoder/Decoder with 3LSB Payload, PasswordDerived Key & External Key File	4	Mar 26, 2026
Mini Web Server in C++ (Part One)	4	Oct 2, 2025
Secure Keyboard v2.0 Modern C++ Virtual Keyboard for Windows (Glassmorphism UI, Clipboard Auto-Clear)	0	Mar 26, 2026

Implementing a Q-Learning Algorithm with Logistic Regression Normalization in C++

codingd

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads