Sequence motif analysis #176

douweschulte · 2022-06-22T15:41:31Z

There are some recent cases with polyclonal datasets which have multiple sequences on a single template that need some way to find the motifs in the reads. This means that varieties have to be tracked to see which ones correlate. For ideas see: https://meme-suite.org/meme/.

A naive algorithm to create such results would be to go over all reads and combine the ones that fit together (using a fuzzy matching based on the alignment) into patches of sequence. All patches (with at least 2% of all reads or some other cutoff) can then be presented on the right location. This would allow the user to see the bigger picture of the alignment with the number of sequencing mistakes drastically reduced.

Example reads

TEMPLATESEQUENCE
TEMPLETE
   PLATESEQ
  METING
SOME

Should compress to the following

TEMPLATESEQUENCE
TEMPLATESEQ
SOMETING

The main missing parts right now are:

User control over the threshold for ambiguity
Annotated in somewhere where the ambiguous nodes are located
Analysis over multiple nodes
Detailed information on the supporting reads (similar to Tree branch node detail page #144)

The text was updated successfully, but these errors were encountered:

douweschulte · 2022-10-04T13:10:14Z

With some more discussion the goal has been rephrased to be: how to correlate ambiguous positions in the final sequence for each template. Which could end up looking like the graph shown below. Above you see where in the sequence the ambiguous nodes are located, and below you see indicated with arrow how good the support is for a link between the two ambiguous positions.

.................1......2.......................3.........4..........

 flowchart LR;
  A1-.->A2;
  A1==>B2;
  B1-->A2;
  A2==>A3;
  A2-.->B3;
  A3-->A4;
  A3-.->B4;
  B2-->B3;
  B3==>B4;

This could hard to fully complete if EnforceUnique is turned on, but work like #146 & #190 & #191 could also help in these cases.

The intend is to run this algorithm after the whole alignment has been done, as all positions for all reads are then known. The ambiguous positions should be identified by the code (first possibility <75% score?) and connections between positions should be found in the placed reads.

Added ambiguous positions annotation in sequence consensus, added ambiguity threshold to batchfile, added range warnings to batchfiles.

douweschulte · 2022-10-11T12:43:09Z

For analysis over multiple ambiguous nodes the following idea came to mind: the user can select a single position which will remove all traces except the ones coming from that position. And the higher order traces from this position will be shown as well. To give the user some feedback which nodes do have a nice level of higher order information there should be some bar showing the sum of all higher order traces for each position or something similar.

douweschulte added C-enhancement Category: New feature or request A-html-report Area: Related to the HTML output report labels Jun 22, 2022

douweschulte added a commit that referenced this issue Oct 10, 2022

Made a start on #176

1862656

douweschulte added a commit that referenced this issue Oct 11, 2022

Worked on #176 and small touch ups

7a4c0c0

Added ambiguous positions annotation in sequence consensus, added ambiguity threshold to batchfile, added range warnings to batchfiles.

douweschulte mentioned this issue Oct 11, 2022

Refactor readsalignment #196

Closed

2 tasks

douweschulte added a commit that referenced this issue Oct 13, 2022

Built a highlighting system for ambiguous nodes #176

d1f3d03

douweschulte added a commit that referenced this issue Oct 14, 2022

Added a start for higher order #176

b76b15c

douweschulte added a commit that referenced this issue Oct 17, 2022

Implemented tail simplification for #176 higher order DAGs

0edf375

douweschulte added a commit that referenced this issue Oct 17, 2022

Added backward higher-order tree #176

10105c2

douweschulte added a commit that referenced this issue Oct 21, 2022

Fixed last major bug for #176

c873d7b

douweschulte closed this as completed Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence motif analysis #176

Sequence motif analysis #176

douweschulte commented Jun 22, 2022 •

edited

Loading

douweschulte commented Oct 4, 2022 •

edited

Loading

douweschulte commented Oct 11, 2022

Sequence motif analysis #176

Sequence motif analysis #176

Comments

douweschulte commented Jun 22, 2022 • edited Loading

douweschulte commented Oct 4, 2022 • edited Loading

douweschulte commented Oct 11, 2022

douweschulte commented Jun 22, 2022 •

edited

Loading

douweschulte commented Oct 4, 2022 •

edited

Loading