Class: Minimap2::Aligner

Inherits:
Object
  • Object
show all
Defined in:
lib/minimap2/aligner.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(fn_idx_in = nil, seq: nil, preset: nil, k: nil, w: nil, min_cnt: nil, min_chain_score: nil, min_dp_score: nil, bw: nil, bw_long: nil, best_n: nil, n_threads: 3, fn_idx_out: nil, max_frag_len: nil, extra_flags: nil, scoring: nil, sc_ambi: nil, max_chain_skip: nil) ⇒ Aligner

Create a new aligner.

Parameters:

  • fn_idx_in (String) (defaults to: nil)

    index or sequence file name.

  • seq (String) (defaults to: nil)

    a single sequence to index.

  • preset (String) (defaults to: nil)

    minimap2 preset.

    • map-pb : PacBio CLR genomic reads

    • map-ont : Oxford Nanopore genomic reads

    • map-hifi : PacBio HiFi/CCS genomic reads (v2.19 or later)

    • asm20 : PacBio HiFi/CCS genomic reads (v2.18 or earlier)

    • sr : short genomic paired-end reads

    • splice : spliced long reads (strand unknown)

    • splice:hq : Final PacBio Iso-seq or traditional cDNA

    • asm5 : intra-species asm-to-asm alignment

    • ava-pb : PacBio read overlap

    • ava-ont : Nanopore read overlap

  • k (Integer) (defaults to: nil)

    k-mer length, no larger than 28.

  • w (Integer) (defaults to: nil)

    minimizer window size, no larger than 255.

  • min_cnt (Integer) (defaults to: nil)

    minimum number of minimizers on a chain.

  • min_chain_score (Integer) (defaults to: nil)

    minimum chain score.

  • min_dp_score (defaults to: nil)
  • bw (Integer) (defaults to: nil)

    chaining and alignment band width. (initial chaining and extension)

  • bw_long (Integer) (defaults to: nil)

    chaining and alignment band width (RMQ-based rechaining and closing gaps)

  • best_n (Integer) (defaults to: nil)

    max number of alignments to return.

  • n_threads (Integer) (defaults to: 3)

    number of indexing threads.

  • fn_idx_out (String) (defaults to: nil)

    name of file to which the index is written. This parameter has no effect if seq is set.

  • max_frag_len (Integer) (defaults to: nil)
  • extra_flags (Integer) (defaults to: nil)

    additional flags defined in minimap.h.

  • scoring (Array) (defaults to: nil)

    scoring system. It is a tuple/list consisting of 4, 6 or 7 positive integers. The first 4 elements specify match scoring, mismatch penalty, gap open and gap extension penalty. The 5th and 6th elements, if present, set long-gap open and long-gap extension penalty. The 7th sets a mismatch penalty involving ambiguous bases.

Raises:

  • (ArgumentError)


41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/minimap2/aligner.rb', line 41

def initialize(
  fn_idx_in = nil,
  seq: nil,
  preset: nil,
  k: nil,
  w: nil,
  min_cnt: nil,
  min_chain_score: nil,
  min_dp_score: nil,
  bw: nil,
  bw_long: nil,
  best_n: nil,
  n_threads: 3,
  fn_idx_out: nil,
  max_frag_len: nil,
  extra_flags: nil,
  scoring: nil,
  sc_ambi: nil,
  max_chain_skip: nil
)
  @idx_opt = FFI::IdxOpt.new
  @map_opt = FFI::MapOpt.new

  r = FFI.mm_set_opt(preset, idx_opt, map_opt)
  raise ArgumentError, "Unknown preset name: #{preset}" if r == -1

  # always perform alignment
  map_opt[:flag] |= 4
  idx_opt[:batch_size] = 0x7fffffffffffffff

  # override preset options
  idx_opt[:k] = k if k
  idx_opt[:w] = w if w
  map_opt[:min_cnt] = min_cnt if min_cnt
  map_opt[:min_chain_score] = min_chain_score if min_chain_score
  map_opt[:min_dp_max] = min_dp_score if min_dp_score
  map_opt[:bw] = bw if bw
  map_opt[:bw_long] = bw_long if bw_long
  map_opt[:best_n] = best_n if best_n
  map_opt[:max_frag_len] = max_frag_len if max_frag_len
  map_opt[:flag] |= extra_flags if extra_flags
  if scoring && scoring.size >= 4
    map_opt[:a] = scoring[0]
    map_opt[:b] = scoring[1]
    map_opt[:q] = scoring[2]
    map_opt[:e] = scoring[3]
    map_opt[:q2] = map_opt[:q]
    map_opt[:e2] = map_opt[:e]
    if scoring.size >= 6
      map_opt[:q2] = scoring[4]
      map_opt[:e2] = scoring[5]
      map_opt[:sc_ambi] = scoring[6] if scoring.size >= 7
    end
  end
  map_opt[:sc_ambi] = sc_ambi if sc_ambi
  map_opt[:max_chain_skip] = max_chain_skip if max_chain_skip

  if fn_idx_in
    warn "Since fn_idx_in is specified, the seq argument will be ignored." if seq
    reader = FFI.mm_idx_reader_open(fn_idx_in, idx_opt, fn_idx_out)

    # The Ruby version raises an error here
    raise "Cannot open : #{fn_idx_in}" if reader.null?

    @index = FFI.mm_idx_reader_read(reader, n_threads)
    FFI.mm_idx_reader_close(reader)
    FFI.mm_mapopt_update(map_opt, index)
    FFI.mm_idx_index_name(index)
  elsif seq
    @index = FFI.mappy_idx_seq(
      idx_opt[:w], idx_opt[:k], idx_opt[:flag] & 1,
      idx_opt[:bucket_bits], seq, seq.size
    )
    FFI.mm_mapopt_update(map_opt, index)
    map_opt[:mid_occ] = 1000 # don't filter high-occ seeds
  end
end

Instance Attribute Details

#idx_optObject (readonly)

Returns the value of attribute idx_opt.



5
6
7
# File 'lib/minimap2/aligner.rb', line 5

def idx_opt
  @idx_opt
end

#indexObject (readonly)

Returns the value of attribute index.



5
6
7
# File 'lib/minimap2/aligner.rb', line 5

def index
  @index
end

#map_optObject (readonly)

Returns the value of attribute map_opt.



5
6
7
# File 'lib/minimap2/aligner.rb', line 5

def map_opt
  @map_opt
end

Instance Method Details

#align(seq, seq2 = nil, name: nil, buf: nil, cs: false, md: false, max_frag_len: nil, extra_flags: nil) ⇒ Array

Note:

Name change: map -> align In the Ruby language, the name map means iterator. The original name is map, but here I use the method name align.

Note:

The use of Enumerator is being considered. The method names may change again.

Returns alignments.

Parameters:

  • seq (String)
  • seq2 (String) (defaults to: nil)
  • buf (FFI::TBuf) (defaults to: nil)
  • cs (true, false) (defaults to: false)
  • md (true, false) (defaults to: false)
  • max_frag_len (Integer) (defaults to: nil)
  • extra_flags (Integer) (defaults to: nil)

Returns:

  • (Array)

    alignments



138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# File 'lib/minimap2/aligner.rb', line 138

def align(
  seq, seq2 = nil,
  name: nil,
  buf: nil,
  cs: false,
  md: false,
  max_frag_len: nil,
  extra_flags: nil
)
  return if index.null?
  return if (map_opt[:flag] & 4).zero? && (index[:flag] & 2).zero?

  map_opt[:max_frag_len] = max_frag_len if max_frag_len
  map_opt[:flag] |= extra_flags if extra_flags

  buf ||= FFI::TBuf.new
  km = FFI.mm_tbuf_get_km(buf)

  n_regs_ptr = ::FFI::MemoryPointer.new :int
  regs_ptr = FFI.mm_map_aux(index, name, seq, seq2, n_regs_ptr, buf, map_opt)
  n_regs = n_regs_ptr.read_int

  regs = Array.new(n_regs) do |i|
    FFI::Reg1.new(regs_ptr + i * FFI::Reg1.size)
  end

  hit = FFI::Hit.new

  cs_str     = ::FFI::MemoryPointer.new(::FFI::MemoryPointer.new(:string))
  m_cs_str   = ::FFI::MemoryPointer.new :int

  alignments = []

  i = 0
  begin
    while i < n_regs
      FFI.mm_reg2hitpy(index, regs[i], hit)

      c = hit[:cigar32].read_array_of_uint32(hit[:n_cigar32])
      cigar = c.map { |x| [x >> 4, x & 0xf] } # 32-bit CIGAR encoding -> Ruby array

      _cs = ""
      _md = ""
      if cs or md
        cur_seq = hit[:seg_id] > 0 && seq2 ? seq2 : seq

        if cs
          l_cs_str = FFI.mm_gen_cs(km, cs_str, m_cs_str, @index, regs[i], cur_seq, 1)
          _cs = cs_str.read_pointer.read_string(l_cs_str)
        end

        if md
          l_cs_str = FFI.mm_gen_md(km, cs_str, m_cs_str, @index, regs[i], cur_seq)
          _md = cs_str.read_pointer.read_string(l_cs_str)
        end
      end

      alignments << Alignment.new(hit, cigar, _cs, _md)

      FFI.mm_free_reg1(regs[i])
      i += 1
    end
  ensure
    while i < n_regs
      FFI.mm_free_reg1(regs[i])
      i += 1
    end
  end
  alignments
end

#free_indexObject

Explicitly releases the memory of the index object.



121
122
123
# File 'lib/minimap2/aligner.rb', line 121

def free_index
  FFI.mm_idx_destroy(index) unless index.null?
end

#kObject

k-mer length, no larger than 28



228
229
230
# File 'lib/minimap2/aligner.rb', line 228

def k
  index[:k]
end

#n_seqObject



238
239
240
# File 'lib/minimap2/aligner.rb', line 238

def n_seq
  index[:n_seq]
end

#seq(name, start = 0, stop = 0x7fffffff) ⇒ Object

Retrieve a subsequence from the index.

Parameters:

  • name
  • start (defaults to: 0)
  • stop (defaults to: 0x7fffffff)


214
215
216
217
218
219
220
221
222
223
224
# File 'lib/minimap2/aligner.rb', line 214

def seq(name, start = 0, stop = 0x7fffffff)
  return if index.null?
  return if (map_opt[:flag] & 4).zero? && (index[:flag] & 2).zero?

  lp = ::FFI::MemoryPointer.new(:int)
  s = FFI.mappy_fetch_seq(index, name, start, stop, lp)
  l = lp.read_int
  return nil if l == 0

  s.read_string(l)
end

#seq_namesObject



242
243
244
245
246
247
# File 'lib/minimap2/aligner.rb', line 242

def seq_names
  ptr = index[:seq].to_ptr
  Array.new(index[:n_seq]) do |i|
    FFI::IdxSeq.new(ptr + i * FFI::IdxSeq.size)[:name]
  end
end

#wObject

minimizer window size, no larger than 255



234
235
236
# File 'lib/minimap2/aligner.rb', line 234

def w
  index[:w]
end