TF binding prediction challenge

Competition details for TF binding prediction challenge

Author

Haky Im

Published

April 6, 2026

TF Binding Prediction Challenge

Overview

Goal: Predict transcription factor (TF) binding scores in DNA sequences
Input: 300bp human DNA sequences
Target: Binding scores for transcription factors

Data Description

The challenge uses real genomic data to predict TF binding scores:

Sequence Data: chr#_sequences.txt.gz files containing 300bp DNA sequences
- Each sequence has a unique identifier in format chr#_start_end
Target Data: chr#_scores.txt.gz files containing binding scores
- Each sequence has a corresponding 300-long vector of binding scores
- Scores are predicted using Homer, a widely used motif discovery tool
- Each position in the vector represents the binding score at that position in the sequence

Data prepared by Sofia Salazar.

Getting Started

Timeline

Training Sessions:
- Tuesday, April 7: Sofia will review implementation of using the code in the basic DNA scoring model. > Provide this baseline code, the notebook-3-dna-cnn had a simplified version probably not compatible with the wandb implementation* Students will continue working on the project. Charles, Sofia, and Ran will be available to help.
- Thursday, April 9: Sofia will explain how to use weights and biases to calibrate hyperparameters of the model (learning rate, number of filters, kernel size, etc). Charles, Sofia, Ran, and Haky will be available to help.
Presentation Day (Thursday, April 9): Students will present preliminary
- Model architecture
- Model performance
- Filter interpretation (time permitting)
- Lessons learned
- zoom link for presentation https://uchicago.zoom.us/j/92299280676?pwd=nnx7WMjZb8ds5b5Wkj3jVsulN11Eyd.1
Submission Deadline: April 17
- Submit best model to Canvas (follow these instructions to share the trained model weights)
- TA (Festus) will test on held-out data (TODO add link to box here, only accessible by instructors and TA)
- Leaderboard will be created

© HakyImLab and Listed Authors - CC BY 4.0 License

---
title: "TF binding prediction challenge"
author: Haky Im
date: 2026-04-06
description: "Competition details for TF binding prediction challenge"
categories:
eval: false
draft: false
---

# TF Binding Prediction Challenge

## Overview
- **Goal**: Predict transcription factor (TF) binding scores in DNA sequences
- **Input**: 300bp human DNA sequences
- **Target**: Binding scores for transcription factors

## Data Description
The challenge uses real genomic data to predict TF binding scores:

- **Sequence Data**: `chr#_sequences.txt.gz` files containing 300bp DNA sequences
  - Each sequence has a unique identifier in format `chr#_start_end`
  
- **Target Data**: `chr#_scores.txt.gz` files containing binding scores
  - Each sequence has a corresponding 300-long vector of binding scores
  - Scores are predicted using [Homer](http://homer.ucsd.edu/homer/motif/index.html), a widely used motif discovery tool
  - Each position in the vector represents the binding score at that position in the sequence
  
Data prepared by Sofia Salazar.

## Getting Started
- [Data download link](https://uchicago.box.com/s/eajhnujlaxnd5441sv3dt73pfoxvgh4l)
- [Starter notebook](notebook-04-tf-binding.qmd)
<!-- - [Example implementation]() -->

## Timeline
- **Training Sessions**:
  - Tuesday, April 7: Sofia will review implementation of using the code in the basic DNA scoring model. 
  > Provide this baseline code, the notebook-3-dna-cnn had a simplified version probably not compatible with the wandb implementation* Students will continue working on the project. Charles, Sofia, and Ran will be available to help.
  - Thursday, April 9: Sofia will explain how to use **weights and biases** to calibrate hyperparameters of the model (learning rate, number of filters, kernel size, etc). Charles, Sofia, Ran, and Haky will be available to help.

- **Presentation Day** (Thursday, April 9):
  Students will present preliminary
  - Model architecture
  - Model performance
  - Filter interpretation (time permitting)
  - Lessons learned
  - zoom link for presentation https://uchicago.zoom.us/j/92299280676?pwd=nnx7WMjZb8ds5b5Wkj3jVsulN11Eyd.1

- **Submission Deadline**: April 17
  - Submit best model to Canvas (follow these instructions to share the trained model weights)
  - TA (Festus) will test on held-out data (TODO add link to box here, only accessible by instructors and TA)
  - Leaderboard will be created