homework 2
homework
Homework 2
Exercises
Question 1 (Part 1: Generate Synthetic DNA Data)
Modify the scoring function to create a more complex pattern. Instead of giving fixed bonuses for TAT and GCG, implement a position-dependent scoring where a motif gets a higher bonus if it appears at the beginning of the sequence compared to the end. How does this change the distribution of scores?
Question 2 (Part 5: Train and Compare)
Compare the performance of the Linear and CNN models by using different learning rates. First run both models with higher learning rates (0.05, 0.1) and lower learning rates (0.005, 0.001), then create loss plots showing:
- Linear model with these learning rates
- CNN model with these learning rates
Then analyze your results by answering:
- How does changing the learning rate affect convergence for each model?
- Which model is more sensitive to learning rate changes, and why?
- Based on your analysis, what learning rate would you recommend for each model type, and why?
Question 3 (Part 6: Evaluate on Test Set)
Design an approach to improve the model’s prediction accuracy, particularly focusing on the sequences where the current model performs poorly:
- After identifying sequences where the CNN model has high prediction errors, propose and implement a modification to either the model architecture, the loss function, the training process, or the data representation
- Retrain the model with your modifications
- Create comparative visualizations (such as scatter plots, error histograms, or other appropriate plots) to demonstrate the impact of your changes
- Analyze your results by discussing how your modification addresses the specific weaknesses you identified. What are the trade-offs involved in your approach?