Please find the requirement in the attached document.
Need this assignment by tuesday 9/20, 10:00pm
No copy and paste, Plagiarism results in course termination.
Assignment 1 Due Date/Time: 9/23/2021, 11:59 PM
Total Points: 100
You will implement the K-means clustering and Fuzzy C-means clustering from scratch using a programming language of your choice. Follow software design principles and document (comment) your code clearly explaining what you did and why you did what you did. In your report, include a README that states how your code is supposed to be run to obtain the expected results.
You will use a dataset representing ten years of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. The dataset is included in the assignment with the filename diabetic_data.csv.
Use the Euclidean distance to compute the distance between any two patients in the dataset. You will run your clustering algorithms with different combinations of variables as specified in each question.
1. K-means clustering with different numbers of clusters (30 points)
a. Run K-means on the entire dataset with the following two variables: ‘time_in_hospital’, and ‘num_medications’ with the number of clusters K = 2. Plot your clusters using a 3D sca�er plot and report (print) the centroid locations. Based on this plot, what are your thoughts on the generated clusters?
b. Test with different numbers of clusters K, running from K = 2 to K = 10 using the same variables in 1a. According to the sca�er plots, which
number of clusters do you think is the most appropriate? Justify your response.
c. Implement Dunn index (DI) cluster validity measure from scratch. Repeat the experiments in problem 1b and compute the corresponding DI indices. Which one do you believe is the best number of clusters according to Dunn indices? Does this agree with your initial observation in problem 1b?
2. K-means clustering with different variables and sample size (30 points)
a. Based on the best number of clusters you obtained in problems 1c and the two variables, does adding the ‘insulin’ variable (total 3 variables) improve clustering results for any 30 patients randomly selected? Use sca�er plots or any other equivalent method to justify your response.
b. Based on the model in problem 2a, does adding the ‘diabetesMed’ and ‘change’ variables (total five variables) improve the clustering results for the same 30 patients? Plot the results and compute the Dunn index to justify your response.
c. Randomly sample 50,000 observations and 10,000 observations from the entire dataset and re-run 2a and 2b for each sample size. Plot the clustering results and compute the Dunn index for each sample size and compare the results with 50,000 and 10,000 observations vs the entire dataset. Justify what you observe.
d. (Bonus): What happens to the relative positioning of the centroids as you sample fewer observations (50,000, 10,000, 5,000) from the data? Do the centroids go farther apart, or do they get closer after your clustering
algorithm has converged? Justify why. Plot your findings (sample size (x-axis) vs Dunn Index (y-axis)). (Bonus: 10 points)
3. Fuzzy C-means clustering (40 points)
a. Implement Fuzzy C-means and apply it with the best number of clusters you selected in problem 1 and the best combination of variables you selected in problem 2 for the entire observations. Was there any difference in the clusters as compared to the K-means clusters? (Compare using visualization tools, using centroid values, OR using some labels and observing the differences).
b. Harden the cluster assignment of Fuzzy C-means and use the Dunn index to compare it with the K-means clustering result. Is there any difference in the results? Which clustering algorithm do you think produces be�er clusters and why?
c. Select one more variable by exploring the data and add this variable into the model in problem 3a. Does adding this new variable improve the clustering results? If so, why or why not? If you play with different variables for 3c, please mention that as well as the variables you experimented with and why you chose that particular additional variable.
Submission Instructions: Submit a zipped file containing your code(s) and report (in pdf) in the Dropbox folder titled “Assignment 1-LastName” on Pilot.
Academic Integrity: Please note that the code and report you submit should be your work and yours alone. If plagiarism is detected, it will be dealt with strictly and in accordance with Wright State guidelines.
We are a professional custom writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework.
Yes. We have posted over our previous orders to display our experience. Since we have done this question before, we can also do it for you. To make sure we do it perfectly, please fill our Order Form. Filling the order form correctly will assist our team in referencing, specifications and future communication.
2. Fill in your paper’s requirements in the "PAPER INFORMATION" section and click “PRICE CALCULATION” at the bottom to calculate your order price.
3. Fill in your paper’s academic level, deadline and the required number of pages from the drop-down menus.
4. Click “FINAL STEP” to enter your registration details and get an account with us for record keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
5. From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.
Need this assignment or any other paper?
Click here and claim 25% off
Discount code SAVE25