Ajay

Ajay Gupta
Professor

IEEE-CS TAC Vice-Chair (2022-24)
IEEE-CS TMRC Member (2013 - )
IEEE-CS TAC Vice-Chair (2015-16)
IEEE-CS TCPP Chair (2011-2015)

Director, WiSe Lab

Western Michigan University
Dept. of Computer Science
Mail Stop 5466
4601 Campus Drive
Kalamazoo, MI, 49008-5466


Phone: 269-276-3101 / 3104
Fax: 269-276-3122
E-Mail: ajay dot gupta at wmich dot edu

CS6100 - Big Data Storage, Retrieval and Processing                                

FIRST DRAFT - TO BE UPDATED

Time & Place: M W 04:00pm-05:15pm, Call Numbers: 44174, 3 credit hours

Prerequisites:

By Courses: CS3100 or CS3310 or equivalent with a grade of C or better; or permission of the instructor.

By Topic: Advance understanding of high-level language programming - conditional structures; looping structures; arrays; program logic - to solve problems; object oriented programming - be able to create and use objects; software life cycle; validating quality of software produced; sorting and searching algorithms; data structures (linked lists, queues, stacks, hash maps); divide-and-conquer, greedy, and backtracking algorithm design techniques; documenting programs effectively and efficiently; team work. Proficient in Python, R, Java, C, C++, or C# beyond the experience in CS1120. Low-level, systems programming, and Linux programming experience desired. Some mathematics and statistics background. Exposure to machine-learning, Python & R programming are a plus. Strong desire, self-motivation & dedication to learn and contribute to the interesting area of big data computing (mainly storage, retrieval and processing of big data).

This course is a 3 credit hour advance undergraduate or graduate level course, intended for students who plan to design and develop applications, or pursue R&D in the exciting area of big data computing. The primary objective of this course is to familiarize the students with the most important information technologies used in manipulating, storing, and analyzing big data. Mainly focusing on big data models, reduction techniques, applications architectures and big data analytics using cloud-based and other related services, this course will provide students with the knowledge and hand-on experience in designing and implementing big data applications. Topics may include the characteristics and challenges of the big data, state-of-the-art computing paradigm sand platforms (e.g., MapReduce), big data programming tools (e.g., Hadoop, GFS, MongoDB and various cloud-based tools), big data extraction and integration, big data storage, scalable indexing for big data, big graph processing, big data stream techniques and algorithms, big probabilistic data management, big data privacy, big data visualizations, and big data applications (e.g., spatial, finance, multimedia, medical, health, and social data). This course will also explore the current challenges facing big data computing.

Catalog Description:

The course provides the student with an advanced understanding of the issues involved in dealing with Big Data. It prepares the student for advanced handling of extremely large data sets, accessing the data, reduction of the data into a manageable size and processing the results. Students will reduce Big Data sets, use and develop R packages and other code to analyze the data and produce graphics to explore and explain the data. This course will be very small team project oriented.

 

Required Texts

Big Data Fundamentals: Concepts, Drivers & Techniques by Erl, Khattak and Buhler, 2016, Pearson, ISBN: 9780134291208 (ebook), 9780134291079 (paper).

We will also extensively refer to research papers, material available on the web and material from the following recommended textbooks.

Recommended Texts

Data Science on the Google Cloud Platform by Valliappa Lakshmanan, 2018, O'reilly, ISBN: 9781491974568

Big-Data Analytics for Cloud, IoT & Cognitive Learning by Kai Hwang and Min Chen, 2017, John Wiley & Sons Inc, ISBN: 9781119247029

Hadoop: The Definitive Guide, Third Edition, by Tom White (O'Reilly)

Data-Intensive Text Processing with MapReduce, by Jimmy Lin and Chris Dyer

R for Data Science by Hadley Wickham and Garrett Grolemund, O'Reilly, 2016, ISBN: 9781491910399

 

Goals

  • Understand the big data characteristics and challenges
  • Understand real world applications and their techniques involving big data
  • Know the current big data processing platforms and tools
  • Understand big data collection, integration and storage
  • Learn the big data indexing
  • Learn various queries over big data
  • Learn the core techniques of processing big data

Learning Outcomes

  • Familiarity with big data characteristics and challenges
  • Proficiency with at least one comprehensive big data handling tool
  • Experience with surveying big-data-related topics and presenting them (orally as well as written)
  • Design and implemention of a R&D project on big data problems
  • Collaboration with team members to study the big data techniques and learn big data tools

Students' progress and achievements towards reaching the course goals and objectives will be apparent from such measures as: the results and creativity displayed in course assignments; and the contents of and performance on certain exam portions.

QuickLinks

Syllabus for Fall 2021

Topics Covered in Fall 2021

Also see NSF/IEEE-TCPP Curriculum Initiative and Computer Science Accerditation Board, in particular ACM Computer Science Curricula 2013. For quick preview of PDCS, HPC and cloud computing topics in CS courses read Parallel Processing pages taken from CS Curricula 2013, and TCPP Curriculum Initiave pages from the CS Curricula 2013.