Explore UCD

UCD Home >

COMP47470

Academic Year 2024/2025

Big Data Programming (COMP47470)

Subject:
Computer Science
College:
Science
School:
Computer Science
Level:
4 (Masters)
Credits:
5
Module Coordinator:
Dr Ravi Manumachu
Trimester:
Autumn and Spring (separate)
Mode of Delivery:
On Campus
Internship Module:
No
How will I be graded?
Letter grades

Curricular information is subject to change.

Big data refers to high-volume, high-velocity and/or high-variety data that is too complex to be dealt with by traditional (relational) data management and data processing systems. The data-intensive nature of big data applications has pushed research and industry practitioners to build innovative solutions that are inherently distributed software systems with novel programming and execution models. This module describes, compares and contrasts the pioneering and leading big data technologies (NoSQL, batch, streaming, and graph). It will teach students how to install a Big Data software technology (NoSQL, batch, streaming, graph) and develop (code and test) a big data application using the technology.

About this Module

Learning Outcomes:

(a). Explain and illustrate properties of traditional and big data management and processing systems (ACID, CAP, BASE).
(b). Compare and contrast relational, NoSQL and newSQL data management systems.
(c). Describe, distinguish, and use big data technologies for batch, stream, and graph processing.
(d). Develop (design and implement) NoSQL, batch, streaming and graph big data applications.

Indicative Module Content:

Introduction to Big Data (Characteristics and classifications)
Database Concepts, Architecture, Database Modelling and Design
The Relational Data Model, SQL, and Introduction to MySQL
Introduction to NoSQL Databases, MongoDB document NoSQL database
MapReduce Programming Model, Introduction to Apache Hadoop, HDFS
Distributed Data Processing using Apache Spark
Introduction to Graph processing and developing large graph applications using Spark's GraphX
Introduction to Data Streams and developing streaming applications using Spark's structured streaming API
Machine learning (supervised, unsupervised, recommendation) using Spark's MLlib API

Student Effort Hours:
Student Effort Type Hours
Autonomous Student Learning

62

Lectures

24

Practical

24

Total

110


Approaches to Teaching and Learning:
Lectures, Laboratory Practicals, Weekly Quizzes, Mid-term Assignments, End-term Exam

Requirements, Exclusions and Recommendations
Learning Recommendations:

It is strongly recommended that students have an acceptable competency level in bash scripting and Python programming language.


Module Requisites and Incompatibles
Not applicable to this module.
 

Assessment Strategy  
Description Timing Component Scale Must Pass Component % of Final Grade In Module Component Repeat Offered
Exam (In-person): 2 hour End of Trimester Exam End of trimester
Duration:
2 hr(s)
Alternative linear conversion grade scale 40% No

40

No
Assignment(Including Essay): Continuous Assessment Throughout the Trimester. Week 3, Week 6, Week 9, Week 11, Week 12 Alternative linear conversion grade scale 40% No

50

No
Quizzes/Short Exercises: A quiz each week comprising questions from the lecture delivered in the same week. Week 2, Week 3, Week 4, Week 5, Week 6, Week 7, Week 8, Week 9, Week 10, Week 11 Alternative linear conversion grade scale 40% No

10

No

Carry forward of passed components
Yes
 

Remediation Type Remediation Timing
Repeat Within Two Trimesters
Please see Student Jargon Buster for more information about remediation types and timing. 

Feedback Strategy/Strategies

• Feedback individually to students, post-assessment
• Group/class feedback, post-assessment
• Online automated feedback

How will my Feedback be Delivered?

Solutions to weekly quizzes. Solutions to continuous assessment assignments.

Fundamentals Of Database System, 7th Edition
by Elmasri Ramez and Navathe Shamkant

Hadoop - The Definitive Guide 4e: Storage and Analysis at Internet Scale, 4th Edition
by Tom White

Spark - The Definitive Guide: Big data processing made simple
by Bill Chambers, Matei Zaharia

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by Martin Kleppmann