UF-74636: Mini-Projects: NoSQL, Big Data & Machine Learning

Gestion et Analyse des Méga-données
Mini-Projects: NoSQL, Big Data & Machine Learning

Ouvert le : mardi 21 octobre 2025, 00:00

À rendre : dimanche 28 décembre 2025, 00:00

🚀 Mini-Projects: Master 2

9 Projects Combining NoSQL, Big Data & Machine Learning

Choose ONE project • Work in groups • Combine NoSQL + Spark + ML • Present results

📋 Project Overview

📊 What is a Mini-Project?

A comprehensive hands-on project integrating NoSQL databases, distributed processing (Spark/MapReduce), and machine learning algorithms. Build real-world data pipelines!

👥 Team Requirements

Groups of 2-3 students. Pick ONE project from the 9 options. Each team works independently on their chosen problem.

⏱️ Timeline & Deliverables

Deliver: Working code, documentation, presentation, and live demo with results.

👥

Project 1

User Behavior Clustering

Technologies: MongoDB, PySpark, MLlib

🎯 Objective

Collect user activity logs, preprocess with Spark, and cluster users using KMeans to identify behavioral segments.

📚 Skills You'll Learn

MongoDB aggregation pipelines
PySpark DataFrames & transformations
Feature scaling and normalization
KMeans clustering & evaluation

📊 Deliverables

MongoDB database with 100K+ logs
Spark pipeline notebook
Cluster profiles & insights
Visualization of segments

🎯

Project 2

Product Recommendation Engine

Technologies: Neo4j, GraphFrames, ALS

🎯 Objective

Build social network in Neo4j, extract to Spark, and use collaborative filtering to recommend products.

📚 Skills You'll Learn

Cypher query language (Neo4j)
GraphFrames in Spark
ALS (Alternating Least Squares)
Recommendation evaluation metrics

📊 Deliverables

Neo4j graph with users/products
Rating matrix & ALS model
Top-N recommendations
Accuracy metrics (RMSE, MAE)

😊

Project 3

Real-Time Sentiment Analysis

Technologies: Redis Streams, Spark Streaming, NLP

🎯 Objective

Stream tweets into Redis, process in real-time with Spark, and classify sentiment using Logistic Regression.

📚 Skills You'll Learn

Redis Streams for ingestion
Spark Streaming architecture
Text preprocessing (tokenization)
Sentiment classification model

📊 Deliverables

Twitter API integration
Real-time streaming pipeline
Sentiment prediction model
Live dashboard with results

🚨

Project 4

Fraud Detection System

Technologies: Cassandra, PySpark, Anomaly Detection

🎯 Objective

Store transactions in Cassandra, use Spark for ETL & feature engineering, detect fraud with Isolation Forest.

📚 Skills You'll Learn

Cassandra data modeling (CQL)
Spark ETL pipelines
Feature engineering techniques
Isolation Forest algorithm

📊 Deliverables

Cassandra transaction store
Feature extraction notebook
Anomaly detection model
Fraud patterns analysis

📡

Project 5

IoT Device Forecasting

Technologies: MongoDB, Spark, Linear Regression

🎯 Objective

Simulate IoT sensors, store time-series in MongoDB, extract features with Spark, forecast values.

📚 Skills You'll Learn

Time-series data modeling
Feature lag creation
Linear regression in Spark
Forecasting evaluation

📊 Deliverables

IoT sensor simulator
MongoDB time-series DB
Regression model (MAE, RMSE)
Forecast visualization

🔍

Project 6

Search Engine for Catalog

Technologies: Elasticsearch, Spark, TF-IDF

🎯 Objective

Index product catalog in Elasticsearch, implement ranking with Spark (TF-IDF, word embeddings).

📚 Skills You'll Learn

Elasticsearch indexing & queries
TF-IDF vectorization
Text similarity algorithms
Ranking optimization

📊 Deliverables

Elasticsearch index
Search API endpoint
Relevance ranking model
Search performance metrics

📚 3 More Projects Available

🛒 Project 7: Market Basket Analysis

Load transactions into Cassandra, use Spark to perform frequent pattern mining (FP-

Growth) to identify customer segments.

Skills: Spark MLlib FP-Growth, Cassandra modeling

👑 Project 8: Social Influence Ranking

Store social interactions in Neo4j, use Spark GraphFrames/PageRank to ﬁnd top inﬂu-

encers and visualize network subgraphs.

Skills: Cypher export, PageRank, Graph visualization.

📰 Project 9: News Classification

Ingest news streams into Redis, classify articles in real time with Spark (e.g., Naive Bayes),

and build a live topic dashboard.

Skills: Streaming ETL, text classiﬁcation, Redis Streams

✅ Project Requirements

Technical Requirements

✓ NoSQL Database: One of: MongoDB, Neo4j, Redis, Cassandra, or Elasticsearch
✓ Big Data Processing: PySpark, Spark, or MapReduce
✓ ML Component: MLlib, scikit-learn, or TensorFlow for prediction/clustering
✓ Minimum Data: 100,000+ records or realistic simulation
✓ Code Quality: Documented, tested, production-ready

Submission Deliverables

📝 Code Repository: GitHub with full source code
📄 Documentation: README, architecture, setup guide
📊 Analysis Report: Results, insights, performance metrics
🎬 Presentation: 15-20 min slides + live demo

🚀 Getting Started

1️⃣

Form Your Team

Recruit 2-3 team members. Choose complementary skills (DB, backend, ML).

2️⃣

Pick Your Project

Read descriptions. Choose based on interests and available resources.

3️⃣

Set Up Environment

Install databases, Spark, Python. Clone starter code if available.

4️⃣

Execute Pipeline

Load data, run processing, train model, collect metrics.

5️⃣

Document & Present

Write report, create slides, prepare demo, submit code.

6️⃣

Present Results

Show your work, discuss insights, answer questions from faculty.

📚 Recommended Resources

MongoDB Documentation

Official MongoDB docs & tutorial
Aggregation pipeline reference
MongoDB University free courses

Apache Spark

PySpark API reference
MLlib documentation
DataCamp Spark courses

scikit-learn

ML algorithms documentation
Model evaluation metrics
Tutorial examples & notebooks

Gestion et Analyse des Méga-donnéesMini-Projects: NoSQL, Big Data & Machine Learning

📋 Project Overview

📊 What is a Mini-Project?

👥 Team Requirements

⏱️ Timeline & Deliverables

Project 1

User Behavior Clustering

🎯 Objective

📚 Skills You'll Learn

📊 Deliverables

Project 2

Product Recommendation Engine

🎯 Objective

📚 Skills You'll Learn

📊 Deliverables

Project 3

Real-Time Sentiment Analysis

🎯 Objective

📚 Skills You'll Learn

📊 Deliverables

Project 4

Fraud Detection System

🎯 Objective

📚 Skills You'll Learn

📊 Deliverables

Project 5

IoT Device Forecasting

🎯 Objective

📚 Skills You'll Learn

📊 Deliverables

Project 6

Search Engine for Catalog

🎯 Objective

📚 Skills You'll Learn

📊 Deliverables

📚 3 More Projects Available

🛒 Project 7: Market Basket Analysis

👑 Project 8: Social Influence Ranking

📰 Project 9: News Classification

✅ Project Requirements

Technical Requirements

Submission Deliverables

🚀 Getting Started

Form Your Team

Pick Your Project

Set Up Environment

Execute Pipeline

Document & Present

Present Results

📚 Recommended Resources

MongoDB Documentation

Apache Spark

scikit-learn

Gestion et Analyse des Méga-données
Mini-Projects: NoSQL, Big Data & Machine Learning