Passer au contenu principal

Blocs

Passer Navigation

Navigation

  • Accueil

    • Pages du site

      • Mes cours

      • Tags

      • Forumإعلانات الموقع

    • Mes cours

    • Cours

      • الكليات

        • كلية الطب

        • MIT كلية الرياضيات و الإعلام الآلي و الاتصالات ال...

          • قسم الاعلام الالي

            • السداسيات الفردية

              • Licence

              • Ingénieur

              • Master

                • M1-MICR-S1

                • M1-IA.AP-S1

                • M1-RISR-S1

                • M2-MICR-S3

                • M2-IA.AP-S3

                  • الدروس

                    • UF-74636

                      • 📢 Course Announcements

                      • Contact Information

                      • MindMap

                      • Module Overview

                      • 🧠 Questions about Lectures & Labs

                      • Chapter 1

                      • Chapter 2

                      • Chapter 3

                      • Chapter 4

                      • Chapter 5

                      • Mini-Projects: NoSQL, Big Data & Machine Learning

                        • DevoirMini-Projects: NoSQL, Big Data & Machine Learning

                      • Quizzes

                      • EXAM

                    • UF-74647

                    • UF-74707

                    • UF-74876

                    • UM-74725

                    • AI-DM

                    • UD-75386

                    • UT-74835

                    • Méthodologie de recherche et échange

                  • اعمال موجهة وتطبيقة

                • M2-RISR-S3

            • السداسيات الزوجية

          • قسم الرياضيات

          • قسم الاتصالات السلكية واللاسلكية

        • كلية العلوم و التكنولوجيا

        • كلية علوم الطبيعة والحياة

        • كلية الحقوق والعلوم السياسية

        • كلية الآداب واللغات والفنون

        • كلية العلوم اﻹقتصادية ، والتسيير والعلوم التجارية

        • كلية العلوم الاجتماعية والإنسانية

      • التكوين في الدكتوراه

      • المرافقة البيداغوجية للأساتذة

logo
Forgotten your username or password?
USMT
  • 📅 Moodle par année
    • Consultez les cours de l'année académique 2025/2026
    • Consultez les cours de l'année académique 2024/2025
    • Consultez les cours de l'année académique 2023/2024
    • CILT
  • Français ‎(fr)‎
    • English ‎(en)‎
    • Français ‎(fr)‎
    • العربية ‎(ar)‎

Gestion et Analyse des Méga-données
Mini-Projects: NoSQL, Big Data & Machine Learning

Ouvert le : mardi 21 octobre 2025, 00:00
À rendre : dimanche 28 décembre 2025, 00:00
🚀 Mini-Projects: Master 2

9  Projects Combining NoSQL, Big Data & Machine Learning

Choose ONE project • Work in groups • Combine NoSQL + Spark + ML • Present results

📋 Project Overview

📊 What is a Mini-Project?

A comprehensive hands-on project integrating NoSQL databases, distributed processing (Spark/MapReduce), and machine learning algorithms. Build real-world data pipelines!

👥 Team Requirements

Groups of 2-3 students. Pick ONE project from the 9 options. Each team works independently on their chosen problem.

⏱️ Timeline & Deliverables

 Deliver: Working code, documentation, presentation, and live demo with results.

👥

Project 1

User Behavior Clustering

Technologies: MongoDB, PySpark, MLlib

🎯 Objective

Collect user activity logs, preprocess with Spark, and cluster users using KMeans to identify behavioral segments.

📚 Skills You'll Learn
  • MongoDB aggregation pipelines
  • PySpark DataFrames & transformations
  • Feature scaling and normalization
  • KMeans clustering & evaluation
📊 Deliverables
  • MongoDB database with 100K+ logs
  • Spark pipeline notebook
  • Cluster profiles & insights
  • Visualization of segments
🎯

Project 2

Product Recommendation Engine

Technologies: Neo4j, GraphFrames, ALS

🎯 Objective

Build social network in Neo4j, extract to Spark, and use collaborative filtering to recommend products.

📚 Skills You'll Learn
  • Cypher query language (Neo4j)
  • GraphFrames in Spark
  • ALS (Alternating Least Squares)
  • Recommendation evaluation metrics
📊 Deliverables
  • Neo4j graph with users/products
  • Rating matrix & ALS model
  • Top-N recommendations
  • Accuracy metrics (RMSE, MAE)
😊

Project 3

Real-Time Sentiment Analysis

Technologies: Redis Streams, Spark Streaming, NLP

🎯 Objective

Stream tweets into Redis, process in real-time with Spark, and classify sentiment using Logistic Regression.

📚 Skills You'll Learn
  • Redis Streams for ingestion
  • Spark Streaming architecture
  • Text preprocessing (tokenization)
  • Sentiment classification model
📊 Deliverables
  • Twitter API integration
  • Real-time streaming pipeline
  • Sentiment prediction model
  • Live dashboard with results
🚨

Project 4

Fraud Detection System

Technologies: Cassandra, PySpark, Anomaly Detection

🎯 Objective

Store transactions in Cassandra, use Spark for ETL & feature engineering, detect fraud with Isolation Forest.

📚 Skills You'll Learn
  • Cassandra data modeling (CQL)
  • Spark ETL pipelines
  • Feature engineering techniques
  • Isolation Forest algorithm
📊 Deliverables
  • Cassandra transaction store
  • Feature extraction notebook
  • Anomaly detection model
  • Fraud patterns analysis
📡

Project 5

IoT Device Forecasting

Technologies: MongoDB, Spark, Linear Regression

🎯 Objective

Simulate IoT sensors, store time-series in MongoDB, extract features with Spark, forecast values.

📚 Skills You'll Learn
  • Time-series data modeling
  • Feature lag creation
  • Linear regression in Spark
  • Forecasting evaluation
📊 Deliverables
  • IoT sensor simulator
  • MongoDB time-series DB
  • Regression model (MAE, RMSE)
  • Forecast visualization
🔍

Project 6

Search Engine for Catalog

Technologies: Elasticsearch, Spark, TF-IDF

🎯 Objective

Index product catalog in Elasticsearch, implement ranking with Spark (TF-IDF, word embeddings).

📚 Skills You'll Learn
  • Elasticsearch indexing & queries
  • TF-IDF vectorization
  • Text similarity algorithms
  • Ranking optimization
📊 Deliverables
  • Elasticsearch index
  • Search API endpoint
  • Relevance ranking model
  • Search performance metrics

📚 3 More Projects Available

🛒 Project 7: Market Basket Analysis


Load transactions into Cassandra, use Spark to perform frequent pattern mining (FP- 

Growth) to identify customer segments.

Skills: Spark MLlib FP-Growth, Cassandra modeling

👑 Project 8: Social Influence Ranking


Store social interactions in Neo4j, use Spark GraphFrames/PageRank to find top influ- 

encers and visualize network subgraphs.

Skills: Cypher export, PageRank, Graph visualization.

📰 Project 9: News Classification


Ingest news streams into Redis, classify articles in real time with Spark (e.g., Naive Bayes), 

and build a live topic dashboard.

Skills: Streaming ETL, text classification, Redis Streams


✅ Project Requirements

Technical Requirements

  • ✓ NoSQL Database: One of: MongoDB, Neo4j, Redis, Cassandra, or Elasticsearch
  • ✓ Big Data Processing: PySpark, Spark, or MapReduce
  • ✓ ML Component: MLlib, scikit-learn, or TensorFlow for prediction/clustering
  • ✓ Minimum Data: 100,000+ records or realistic simulation
  • ✓ Code Quality: Documented, tested, production-ready

Submission Deliverables

  • 📝 Code Repository: GitHub with full source code
  • 📄 Documentation: README, architecture, setup guide
  • 📊 Analysis Report: Results, insights, performance metrics
  • 🎬 Presentation: 15-20 min slides + live demo

🚀 Getting Started

1️⃣

Form Your Team

Recruit 2-3 team members. Choose complementary skills (DB, backend, ML).

2️⃣

Pick Your Project

Read descriptions. Choose based on interests and available resources.

3️⃣

Set Up Environment

Install databases, Spark, Python. Clone starter code if available.

4️⃣

Execute Pipeline

Load data, run processing, train model, collect metrics.

5️⃣

Document & Present

Write report, create slides, prepare demo, submit code.

6️⃣

Present Results

Show your work, discuss insights, answer questions from faculty.

📚 Recommended Resources

MongoDB Documentation

  • Official MongoDB docs & tutorial
  • Aggregation pipeline reference
  • MongoDB University free courses

Apache Spark

  • PySpark API reference
  • MLlib documentation
  • DataCamp Spark courses

scikit-learn

  • ML algorithms documentation
  • Model evaluation metrics
  • Tutorial examples & notebooks
◄ Project Management and Implementation
Test N°1 ►

Blocs

Retour

 https://www.univ-saida.dz/  sec.elearning@univ-saida.dz  048931000,1304
Vous êtes connecté anonymement (Connexion)