|
|
|
College of Computing, Engineering, and Construction Department of Computer and Information Sciences 4567 St. johns Bluff Rd., South Jacksonville, FL 32224
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
COP4720 (Database Systems) or COP4710 (Data Modeling)
|
|
|
|
|
|
|
|
|
The aggressive rate of growth of the size of data stored in databases and data warehouses has outpaced our ability to carefully analyze and understand what we have collected. Data mining is the science of analyzing data to discover hidden knowledge and identify interesting patterns that could exist within data mountains. This course approaches data mining topics from an Artificial Intelligence/Machine Learning perspective. Topics to be covered include Characterization and Comparison, Association Rules Mining, Classification and Prediction, Cluster Analysis, and Mining Complex Types of Data. The course will also cover Applications and Trends in Data Mining.
|
|
|
|
|
|
- J. Han and M. Kamber, “Data Mining Concepts and Techniques,” Morgan Kaufmann, 2001. ISBN: 1-55860-489-8.
|
|
|
|
|
|
- M. Kantardzic, “Data Mining: Concepts, Models, Methods and Algorithms,” Wiley-IEEE Press, 2002. ISBN: 0-471-22852-4.
|
|
|
|
|
|
- T. Mitchell, “Machine Learning,” McGraw-Hill, 1997. ISBN: 0-07-042807-7.
- Links to research papers will be posted on Blackboard.
|
|
|
|
|
- Introduction
- Data Warehousing and OLAP Technology for Data Mining
- Data Preprocessing
- Data Cleaning, Integration, Transformation, and Reduction
- Discretization and Concept Hierarchy Generation
- Data Mining Primitives, Languages and Systems
- Characterization and Comparison
- Data Generalization and Summarization
- Analytical Characterization
- Mining Association Rules
- The A priori Algorithm
- Finding Frequent Itemsets
- Generating Association Rules from Frequent Itemsets
- Classification and Prediction
- Decision Tree Induction
- Naïve Bayesian Classification
- Classification using Bayesian Belief Networks
- Neural Networks (Backpropagation Classification)
- Genetic Algorithms
- Linear and Multiple Regression
- Cluster Analysis
- Partitioning Methods
- Hierarchical Methods
- Density-Based Methods
- Grid-Based Methods
- Model-Based Methods
- Outlier Analysis
- Mining Complex Types of Data
- Mining Spatial Databases
- Mining Multimedia Databases
- Mining Time-Series and Sequential Data
- Mining Text Databases
- Mining the World Wide Web
- Applications and Trends in Data Mining
|
|
|
|
|
Individual Assignments and Team Project:
|
|
|
|
Individual assignments, and team projects will be focused on data mining fundamental operations. A combination of essay questions and programming problems will be given to students next to each section to assess their comprehension. In each assignment, students will be required to implement some of the surveyed algorithms. For example, students will be required to implement the Apriori algorithm next to the association mining section, decision tree induction next to the classification and prediction section, and CURE and Chameleon algorithms next to the clustering section.
In the team project, students will be required to write a proposal with a timeline. Students will be required to conduct a literature review related to their particular project. The implementation of the project should involve real data, which can be acquired from machine learning repositories such as University of California at Irvin’s collection, or the Census Bureau. Each team is expected to furnish a progress report detailing implementation successes, obstacles and deviations from their first proposal. Finally, each team is expected to present the project to class and to submit a final project report.
Graduate students may be required to write a term paper.
|
|
|
|
|
|
Knowledge of C, C++, or Java is required for all students. Students will be introduced to Logic Programming using Prolog.
|
|
|
|
|
|
No specific programming tool or software package is required for the course, however students will be encouraged to use a statistics software (a spreadsheet, or R-software of GNU), a neural network simulator (SNNS), or a data mining software (DBMiner) for their team project. All the software mentioned above are either available at UNF or can be acquired by individual students free of charges.
|
|
|
|
|
|
|
|
|
|