-
Notifications
You must be signed in to change notification settings - Fork 183
MotifFinder
##Usage##
motifs <genomeID> <bed_file_dir> <looplist> [custom_global_motif_list]
The required arguments are:
- <genomeID>: Example hg19 or hg38
- <bed_file_dir> File path to a directory which contains two folders: "unique" and "inferred". These folders should contain a combination of RAD21, SMC3, and CTCF BED files. By intersecting these 1D tracks, the strongest peaks will be identified. Unique motifs generally use a more stringent combination of BED files than inferred motifs.
- <looplist>: List of peaks in standard 2D feature format (chr1 x1 x2 chr2 y1 y2 color ...)
Optional arguments:
- [custom_global_motif_list]: Motif list output using FIMO format (by default, Juicer will attempt to find the file from an online repository). Genomewide FIMO motifs are available on Box under
/opt/juicer/references/genomewide_ctcf_motif_fimo
.
##Examples## Assuming the following file structure is present:
/path/to/local/bed/files/unique/CTCF.bed
/path/to/local/bed/files/unique/RAD21.bed
/path/to/local/bed/files/unique/SMC3.bed
/path/to/local/bed/files/inferred/CTCF.bed
motifs hg19 /path/to/local/bed/files gm12878_hiccups_loops.txt hg_19_custom_motif_list.txt
This command will find motifs from hg_19_custom_motif_list.txt for the loops in gm12878_hiccups_loops.txt and save them to gm12878_hiccups_loops_with_motifs.txt. The CTCF, RAD21, and SMC3 BED files will be used together (i.e. intersected) to find unique motifs. Just the CTCF track will be used to infer best motifs.
##Result##
Motif Finder will create a new file looplist_with_motifs.txt, which will add 10 fields for each loop in the loop list. See original loop list fields here.
The additional fields use the following format:
motif_x1 motif_x2 sequence1 orientation1 uniqueness1
motif_y1 motif_y2 sequence2 orientation2 uniqueness2
Explanations of each field are as follows:
- motif_x1,x2 = the start and end coordinates of the localized CTCF motif within the upstream loop anchor
- sequence1 = the sequence of the localized CTCF motif within the upstream loop anchor
- orientation1 = the orientation of the localized CTCF motif within the upstream loop anchor; p’ if the motif sequence is on the forward strand and ’n’ if the motif sequence is on the reverse strand
- uniqueness1 = whether the localized CTCF motif within the upstream loop anchor was uniquely called (‘u’) or inferred based on the convergent orientation principle (‘i’)
- motif_y1,y2; sequence2; orientation2; and uniqueness2 are the same as above except for the downstream loop anchor.
If the fields are ‘NA’ that indicates that we were unable to localize the anchor to a single CTCF motif.
See Section VI.e.7 of the Extended Experimental Procedures of Rao, Huntley et al. Cell 2014 for more details.