CMPSC 445: Applied Machine Learning in Data Science

Textbook Information

No purchasing needed.

  • Textbook 1: Data Science from Scratch: First Principles with Python by Joel Grus, Second Edition, 2019 (ISBN: 978-1-4920-4113-9) free online for Penn State students
  • Textbook 2: Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, Second Edition,, 2019 (ISBN: 978-1492032649) free online for Penn State students

Published Remarks

  • None

Hardware Requirements

  • None

Software Requirements

  • None

Proctored Exams

  • None

Course Description

Overview

During this course, the topics included are:

  • Machine Learning Fundamentals
    • Supervised and unsupervised learning techniques
    • Deep learning
    • Natural Language Processing
  • Web programming and tools
  • Introduction to IBM Cloud and AWS Developers Cloud Functionality, Cognitive Service APIs
  • Explore recommended topic domains, building a corpus of data that will maximize Q/A accuracy
  • Build the application prototype, training, test, evaluation, improvement, etc.

Prerequisite:

CMPSC 122 /132 Intermediate Programming, MATH 220 Matrices, and STAT 318 Elementary Probability

Objectives

By the end of this course, students will be expected to demonstrate competency in the following areas.

By the end of the course, you will have the ability to

  • apply machine learning techniques to create models and make predictions based on the model
  • discern the advantages and disadvantages of the machine learning models and evaluations of the models
  • explain cloud services (Watson and AWS) and its underlying technologies in machine learning and natural language processing
  • develop a web-based application that solves a real-world challenge.
  • develop a corpus of data in a domain with recommended types of text content and develop an understanding of how corpora are ingested and trained for accuracy.

Learning Topics:

Module 1: Introduction to artificial intelligence and machine learning. Overview of supervised and unsupervised learning. Material on Python for C++ programmers. Introduction to Anaconda. Module 2: Linear Algebra and Probabilities Review. Material on Python machine learning-related libraries, including NumPy, SciPy, pandas and scikit-learn. Module 3: Testing and Cross-Validation. Decision tree learning. Linear and Logistic Regression. Module 4: K-Nearest Neighbors (K-NN). Naïve Bayes. Support Vector Machine (SVM). Module 5: K-means. Principle Component Analysis (PCA). Gradient Descent. Module 6: Artificial Neural Network. Feed-forward network and back-propagation. Module 7: Introduction to deep learning. Convolutional neural network. TensorFlow and Keras. Module 8: Team Project Description Team Project: Team Formation Team Project: Proposal Presentation and Submission Example Machine Learning Projects Module 9: Natural language processing (NLP) techniques. Keyword extraction. Module 10: Cloud machine learning Data science platforms Transfer Information over the Web: JSON Module 11: Question Answering System Chatbot. Team Project Presentation Feedback. Module 12: Server and Client HTML, JavaScript RESTful API. Team Project Implementation. Module 13: More examples of IBM Watson and AWS services. Team Project Final Demo and Presentations. Module 14: Team Project Final Demo and Presentations. Final report, Code delivery

This course supports the following ABET computer science (CS) outcomes:

  • (#2)  Design, implement, and evaluate a computing-based solution to meet a given set of computing requirements in the context of the program’s discipline.
  • (#4) Recognize professional responsibilities and make informed judgments in computing practice based on legal and ethical principles.
  • (#5) Function effectively as a member or leader of a team engaged in activities appropriate to the program’s discipline

This course supports the following ABET computer science (SE) outcomes:

  • (#4) an ability to recognize ethical and professional responsibilities in engineering situations and make informed judgments, which must consider the impact of engineering solutions in global, economic, environmental, and societal contexts
  • (#5) an ability to function effectively on a team whose members together provide leadership, create a collaborative and inclusive environment, establish goals, plan tasks, and meet objectives
  • (#6) an ability to develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to draw conclusions

Course Requirements and Grading

Exams, deliverables, and project submission based on the following percentages:

Activities and Assignments: 100% Individual/Group
Lab Assignments and Quizzes 40% Individual
Course Projects 30% Group
Exam 1 15% Individual
Exam 2 15% Individual
Participation (bonus) 5% (extra) Individual

If you observe that a judgment or calculation error has been made in your grading score for any submission in the class, please document this in detail (with screenshots, a reference to the section of the textbook) in an email within one week after the grade was published. I will review it and get back to you with a decision as soon as possible. Grading scale: A Penn-State compliant grading scale will be used to determine the final letter grade. A perfect attendance record can be helpful for students within one percentage point from the next higher grade.

Score A A- B+ B B- C+ C D F
Grade 100-94 <94 <90 <87 <83 <80 <75 <70 <60