Module: RedAmber::DataFrameSelectable

Included in:
DataFrame
Defined in:
lib/red_amber/data_frame_selectable.rb

Overview

mix-in for the class DataFrame

Instance Method Summary collapse

Instance Method Details

#[](key) ⇒ Vector #[](keys) ⇒ DataFrame #[](index) ⇒ DataFrame #[](indices) ⇒ DataFrame

Select variables or records.

Overloads:

  • #[](key) ⇒ Vector
    Note:

    DataFrame.v(key) is faster to create Vector from a variable.

    select single variable and return as a Vetor.

    Parameters:

    • key (Symbol, String)

      key name to select.

    Returns:

    • (Vector)

      selected variable as a Vector.

  • #[](keys) ⇒ DataFrame

    select variables and return a DataFrame.

    Parameters:

    • keys (<Symbol, String>)

      key names to select.

    Returns:

    • (DataFrame)

      selected variables as a DataFrame.

  • #[](index) ⇒ DataFrame

    select records and return a DataFrame.

    Parameters:

    • index (Indeger, Float, Range<Integer>, Vector, Arrow::Array)

      index of a row to select.

    Returns:

    • (DataFrame)

      selected variables as a DataFrame.

  • #[](indices) ⇒ DataFrame

    select records and return a DataFrame.

    Parameters:

    • indices (<Indeger, Float, Range<Integer>, Vector, Arrow::Array>)

      indices of rows to select.

    Returns:

    • (DataFrame)

      selected variables as a DataFrame.

Raises:



39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# File 'lib/red_amber/data_frame_selectable.rb', line 39

def [](*args)
  raise DataFrameArgumentError, 'self is an empty dataframe' if empty?

  case args
  in [] | [nil]
    return remove_all_values
  in [(Symbol | String) => k] if key? k
    return variables[k.to_sym]
  in [Integer => i]
    return take([i.negative? ? i + size : i])
  in [Vector => v]
    arrow_array = v.data
  in [(Arrow::Array | Arrow::ChunkedArray) => aa]
    arrow_array = aa
  else
    a = parse_args(args, size)
    return select_variables_by_keys(a) if a.symbols?
    return take(normalize_indices(Arrow::Array.new(a))) if a.integers?
    return remove_all_values if a.compact.empty?
    return filter_by_array(Arrow::BooleanArray.new(a)) if a.booleans?

    raise DataFrameArgumentError, "invalid arguments: #{args}"
  end

  return take(normalize_indices(arrow_array)) if arrow_array.numeric?
  return filter_by_array(arrow_array) if arrow_array.boolean?

  a = arrow_array.to_a
  return select_variables_by_keys(a) if a.symbols_or_strings?

  raise DataFrameArgumentError, "invalid arguments: #{args}"
end

#filter(*booleans) ⇒ Object



288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
# File 'lib/red_amber/data_frame_selectable.rb', line 288

def filter(*booleans)
  booleans.flatten!
  case booleans
  in []
    return remove_all_values
  in [Arrow::BooleanArray => b]
    filter_by_array(b)
  else
    unless booleans.booleans?
      raise DataFrameArgumentError, 'Argument is not a boolean.'
    end

    filter_by_array(Arrow::BooleanArray.new(booleans))
  end
end

#first(n_obs = 1) ⇒ Object



270
271
272
# File 'lib/red_amber/data_frame_selectable.rb', line 270

def first(n_obs = 1)
  head(n_obs)
end

#head(n_obs = 5) ⇒ Object



258
259
260
261
262
# File 'lib/red_amber/data_frame_selectable.rb', line 258

def head(n_obs = 5)
  raise DataFrameArgumentError, "Index is out of range #{n_obs}" if n_obs.negative?

  self[0...[n_obs, size].min]
end

#last(n_obs = 1) ⇒ Object



274
275
276
# File 'lib/red_amber/data_frame_selectable.rb', line 274

def last(n_obs = 1)
  tail(n_obs)
end

#remove(row) {|self| ... } ⇒ DataFrame #remove(rows) {|self| ... } ⇒ DataFrame

Select records and remove them to create a remainer DataFrame.

Overloads:

  • #remove(row) {|self| ... } ⇒ DataFrame

    select a record and remove it to create a remainer DataFrame.

    • The order of records in self will be preserved.

    Parameters:

    • row (Indeger, Float, Range<Integer>, Vector, Arrow::Array)

      a row index to remove.

    Yields:

    • (self)

      gives self to the block. @note The block is evaluated within the context of self.

      It is accessable to self's instance variables and private methods.
      

    Yield Returns:

    • (Indeger, Float, Range<Integer>, Vector, Arrow::Array)

      a row index to remove.

    Returns:

    • (DataFrame)

      remainer variables as a DataFrame.

  • #remove(rows) {|self| ... } ⇒ DataFrame

    select records and remove them to create a remainer DataFrame.

    • The order of records in self will be preserved.

    Parameters:

    • rows (Indeger, Float, Range<Integer>, Vector, Arrow::Array)

      row indeces to remove.

    Yields:

    • (self)

      gives self to the block. @note The block is evaluated within the context of self.

      It is accessable to self's instance variables and private methods.
      

    Yield Returns:

    • (<Indeger, Float, Range<Integer>, Vector, Arrow::Array>)

      row indeces to remove.

    Returns:

    • (DataFrame)

      remainer variables as a DataFrame.

Raises:



214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# File 'lib/red_amber/data_frame_selectable.rb', line 214

def remove(*args, &block)
  raise DataFrameArgumentError, 'Self is an empty dataframe' if empty?

  if block
    unless args.empty?
      raise DataFrameArgumentError, 'Must not specify both arguments and block.'
    end

    args = [instance_eval(&block)]
  end

  arrow_array =
    case args
    in [] | [[]] | [nil]
      return self
    in [Vector => v]
      v.data
    in [(Arrow::Array | Arrow::ChunkedArray) => aa]
      aa
    else
      Arrow::Array.new(parse_args(args, size))
    end

  if arrow_array.boolean?
    filter_by_array(arrow_array.primitive_invert)
  elsif arrow_array.numeric?
    remover = normalize_indices(arrow_array).to_a
    return self if remover.empty?

    slicer = indices.to_a - remover.map(&:to_i)
    return remove_all_values if slicer.empty?

    take(slicer)
  else
    raise DataFrameArgumentError, "Invalid argument #{args}"
  end
end

#remove_nilObject Also known as: drop_nil



252
253
254
255
# File 'lib/red_amber/data_frame_selectable.rb', line 252

def remove_nil
  func = Arrow::Function.find(:drop_null)
  DataFrame.create(func.execute([table]).value)
end

#slice(row) {|self| ... } ⇒ DataFrame #slice(rows) {|self| ... } ⇒ DataFrame

Select records to create a DataFrame.

Overloads:

  • #slice(row) {|self| ... } ⇒ DataFrame

    select a record and return a DataFrame.

    Parameters:

    • row (Indeger, Float, Range<Integer>, Vector, Arrow::Array)

      a row index to select.

    Yields:

    • (self)

      gives self to the block. @note The block is evaluated within the context of self.

      It is accessable to self's instance variables and private methods.
      

    Yield Returns:

    • (Indeger, Float, Range<Integer>, Vector, Arrow::Array)

      a row index to select.

    Returns:

    • (DataFrame)

      selected variables as a DataFrame.

  • #slice(rows) {|self| ... } ⇒ DataFrame

    select records and return a DataFrame.

    • Duplicated selection is acceptable. The same record will be returned.

    • The order of records will be the same as specified indices.

    Parameters:

    • rows (Integer, Float, Range<Integer>, Vector, Arrow::Array)

      row indeces to select.

    Yields:

    • (self)

      gives self to the block. @note The block is evaluated within the context of self.

      It is accessable to self's instance variables and private methods.
      

    Yield Returns:

    • (<Integer, Float, Range<Integer>, Vector, Arrow::Array>)

      row indeces to select.

    Returns:

    • (DataFrame)

      selected variables as a DataFrame.

Raises:



110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# File 'lib/red_amber/data_frame_selectable.rb', line 110

def slice(*args, &block)
  raise DataFrameArgumentError, 'Self is an empty dataframe' if empty?

  if block
    unless args.empty?
      raise DataFrameArgumentError, 'Must not specify both arguments and block.'
    end

    args = [instance_eval(&block)]
  end

  arrow_array =
    case args
    in [] | [[]]
      return remove_all_values
    in [Vector => v]
      v.data
    in [(Arrow::Array | Arrow::ChunkedArray) => aa]
      aa
    else
      Arrow::Array.new(parse_args(args, size))
    end

  if arrow_array.numeric?
    take(normalize_indices(arrow_array))
  elsif arrow_array.boolean?
    filter_by_array(arrow_array)
  elsif arrow_array.to_a.compact.empty?
    # Ruby 3.0.4 does not accept Arrow::Array#compact here. 2.7.6 and 3.1.2 is OK.
    remove_all_values
  else
    raise DataFrameArgumentError, "invalid arguments: #{args}"
  end
end

#slice_by(key, keep_key: false, &block) ⇒ Object



145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# File 'lib/red_amber/data_frame_selectable.rb', line 145

def slice_by(key, keep_key: false, &block)
  raise DataFrameArgumentError, 'Self is an empty dataframe' if empty?
  raise DataFrameArgumentError, 'No block given' unless block
  raise DataFrameArgumentError, "#{key} is not a key of self" unless key?(key)
  return self if key.nil?

  slicer = instance_eval(&block)
  return DataFrame.new unless slicer

  if slicer.is_a?(Range)
    from = slicer.begin
    from =
      if from.is_a?(String)
        self[key].index(from)
      elsif from.nil?
        0
      elsif from < 0
        size + from
      else
        from
      end
    to = slicer.end
    to =
      if to.is_a?(String)
        self[key].index(to)
      elsif to.nil?
        size - 1
      elsif to < 0
        size + to
      else
        to
      end
    slicer = (from..to).to_a
  else
    slicer = slicer.map { |x| x.is_a?(String) ? self[key].index(x) : x }
  end

  taken = take(normalize_indices(Arrow::Array.new(slicer)))
  keep_key ? taken : taken.drop(key)
end

#tail(n_obs = 5) ⇒ Object



264
265
266
267
268
# File 'lib/red_amber/data_frame_selectable.rb', line 264

def tail(n_obs = 5)
  raise DataFrameArgumentError, "Index is out of range #{n_obs}" if n_obs.negative?

  self[-[n_obs, size].min..]
end

#take(index_array) ⇒ Object



282
283
284
# File 'lib/red_amber/data_frame_selectable.rb', line 282

def take(index_array)
  DataFrame.create(@table.take(index_array))
end

#v(key) ⇒ Object

Select a variable by a key in String or Symbol



73
74
75
76
77
78
79
80
# File 'lib/red_amber/data_frame_selectable.rb', line 73

def v(key)
  unless key.is_a?(Symbol) || key.is_a?(String)
    raise DataFrameArgumentError, "Key is not a Symbol or a String: [#{key}]"
  end
  raise DataFrameArgumentError, "Key does not exist: [#{key}]" unless key? key

  variables[key.to_sym]
end