Module: RedAmber::DataFrameCombinable

Included in:
DataFrame
Defined in:
lib/red_amber/data_frame_combinable.rb

Overview

mix-in for the class DataFrame

Instance Method Summary collapse

Instance Method Details

#anti_join(other, suffix: '.1') ⇒ DataFrame #anti_join(other, join_keys, suffix: '.1') ⇒ DataFrame #anti_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

Return records of self that do not have a match in other.

  • Same as ‘#join` with `type: :left_anti`

  • A kind of filtering join.

Overloads:

  • #anti_join(other, suffix: '.1') ⇒ DataFrame

    If ‘join_key` is not specified, common keys in self and other are used (natural keys). Returns joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    without key (use implicit common key)

    df.anti_join(other)
    #=>
      KEY           X1
      <string> <uint8>
    0 C              3

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #anti_join(other, join_keys, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with a key

    df.anti_join(other, :KEY)
    #=>
      KEY           X1
      <string> <uint8>
    0 C              3

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_keys (String, Symbol, Array<String, Symbol>)

      A key or keys to match.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #anti_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df2 = DataFrame.new(KEY1: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY1          X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other2 = DataFrame.new(KEY2: %w[A B D], X2: [true, false, nil])
    #=>
      KEY2     X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with key pairs

    df2.anti_join(other2, { left: :KEY1, right: :KEY2 })
    #=>
      KEY1          X1
      <string> <uint8>
    0 C              3

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_key_pairs (Hash)

      Pairs of a key name or key names to match in left and right.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Options Hash (join_key_pairs):

    • :left (String, Symbol, Array<String, Symbol>)

      Join keys in ‘self`.

    • :right (String, Symbol, Array<String, Symbol>)

      Join keys in ‘other`.

    Returns:



515
516
517
# File 'lib/red_amber/data_frame_combinable.rb', line 515

def anti_join(other, join_keys = nil, suffix: '.1')
  join(other, join_keys, type: :left_anti, suffix: suffix)
end

#concatenate(*other) ⇒ DataFrame Also known as: concat, bind_rows

Note:

the ‘#types` must be same as `other#types`.

Concatenate other dataframes or tables onto the bottom of self.

Examples:

df    = DataFrame.new(x: [1, 2], y: ['A', 'B'])
other = DataFrame.new(x: [3, 4], y: ['C', 'D'])
[df.types, other.types]
#=>
[[:uint8, :string], [:uint8, :string]]

df.concatenate(other)
#=>
        x y
  <uint8> <string>
0       1 A
1       2 B
2       3 C
3       4 D

Parameters:

  • other (DataFrame, Arrow::Table, Array<DataFrame, Arrow::Table>)

    DataFrames or Tables to concatenate.

Returns:



32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# File 'lib/red_amber/data_frame_combinable.rb', line 32

def concatenate(*other)
  case other
  in [] | [nil] | [[]]
    return self
  in [Array => array]
    # Nop
  else
    array = other
  end

  table_array = array.map do |e|
    case e
    when Arrow::Table
      e
    when DataFrame
      e.table
    else
      raise DataFrameArgumentError, "#{e} is not a Table or a DataFrame"
    end
  end

  DataFrame.create(table.concatenate(table_array))
end

#difference(other) ⇒ DataFrame Also known as: setdiff

Select records appearing in self but not in other.

  • Same as ‘#join` with `type: :left_anti` when keys in self are same with other.

  • A kind of set operations.

Examples:

df3 = DataFrame.new(
  KEY1: %w[A B C],
  KEY2: [1, 2, 3]
)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              2
2 C              3

other3 = DataFrame.new(
  KEY1: %w[A B D],
  KEY2: [1, 4, 5]
)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              4
2 D              5
df3.intersect(other3)
#=>
  KEY1        KEY2
  <string> <uint8>
0 B              2
1 C              3

other.intersect(df)
#=>
  KEY1        KEY2
  <string> <uint8>
0 B              4
1 D              5

Parameters:

  • other (DataFrame, Arrow::Table)

    A DataFrame or a Table to be joined with self.

Returns:



603
604
605
606
607
608
609
# File 'lib/red_amber/data_frame_combinable.rb', line 603

def difference(other)
  unless keys == other.keys.map(&:to_sym)
    raise DataFrameArgumentError, 'keys are not same with self and other'
  end

  join(other, keys, type: :left_anti)
end

#full_join(other, suffix: '.1') ⇒ DataFrame #full_join(other, join_keys, suffix: '.1') ⇒ DataFrame #full_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame Also known as: outer_join

Join another DataFrame or Table, leaving all records.

  • Same as ‘#join` with `type: :full_outer`

  • A kind of mutating join.

Overloads:

  • #full_join(other, suffix: '.1') ⇒ DataFrame

    If ‘join_key` is not specified, common keys in self and other are used (natural keys). Returns joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    without key (use implicit common key)

    df.full_join(other)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 C              3 (nil)
    3 D          (nil) (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #full_join(other, join_keys, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with a key

    df.full_join(other, :KEY)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 C              3 (nil)
    3 D          (nil) (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_keys (String, Symbol, Array<String, Symbol>)

      A key or keys to match.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #full_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df2 = DataFrame.new(KEY1: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY1          X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other2 = DataFrame.new(KEY2: %w[A B D], X2: [true, false, nil])
    #=>
      KEY2     X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with key pairs

    df2.full_join(other2, { left: :KEY1, right: :KEY2 })
    #=>
      KEY1          X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 C              3 (nil)
    3 D          (nil) (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_key_pairs (Hash)

      Pairs of a key name or key names to match in left and right.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Options Hash (join_key_pairs):

    • :left (String, Symbol, Array<String, Symbol>)

      Join keys in ‘self`.

    • :right (String, Symbol, Array<String, Symbol>)

      Join keys in ‘other`.

    Returns:



304
305
306
# File 'lib/red_amber/data_frame_combinable.rb', line 304

def full_join(other, join_keys = nil, suffix: '.1')
  join(other, join_keys, type: :full_outer, suffix: suffix)
end

#inner_join(other, suffix: '.1') ⇒ DataFrame #inner_join(other, join_keys, suffix: '.1') ⇒ DataFrame #inner_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

Join another DataFrame or Table, leaving only the matching records.

  • Same as ‘#join` with `type: :inner`

  • A kind of mutating join.

Overloads:

  • #inner_join(other, suffix: '.1') ⇒ DataFrame

    If ‘join_key` is not specified, common keys in self and other are used (natural keys). Returns joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    without key (use implicit common key)

    df.inner_join(other)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #inner_join(other, join_keys, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with a key

    df.inner_join(other, :KEY)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_keys (String, Symbol, Array<String, Symbol>)

      A key or keys to match.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #inner_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df2 = DataFrame.new(KEY1: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY1          X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other2 = DataFrame.new(KEY2: %w[A B D], X2: [true, false, nil])
    #=>
      KEY2     X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with key pairs

    df2.inner_join(other2, { left: :KEY1, right: :KEY2 })
    #=>
      KEY1          X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_key_pairs (Hash)

      Pairs of a key name or key names to match in left and right.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Options Hash (join_key_pairs):

    • :left (String, Symbol, Array<String, Symbol>)

      Join keys in ‘self`.

    • :right (String, Symbol, Array<String, Symbol>)

      Join keys in ‘other`.

    Returns:



247
248
249
# File 'lib/red_amber/data_frame_combinable.rb', line 247

def inner_join(other, join_keys = nil, suffix: '.1')
  join(other, join_keys, type: :inner, suffix: suffix)
end

#intersect(other) ⇒ DataFrame

Select records appearing in both self and other.

  • Same as ‘#join` with `type: :inner` when keys in self are same with other.

  • A kind of set operations.

Examples:

df3 = DataFrame.new(
  KEY1: %w[A B C],
  KEY2: [1, 2, 3]
)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              2
2 C              3

other3 = DataFrame.new(
  KEY1: %w[A B D],
  KEY2: [1, 4, 5]
)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              4
2 D              5
df3.intersect(other3)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1

Parameters:

  • other (DataFrame, Arrow::Table)

    A DataFrame or a Table to be joined with self.

Returns:



547
548
549
550
551
552
553
# File 'lib/red_amber/data_frame_combinable.rb', line 547

def intersect(other)
  unless keys == other.keys.map(&:to_sym)
    raise DataFrameArgumentError, 'keys are not same with self and other'
  end

  join(other, keys, type: :inner)
end

#join(other, type: :inner, suffix: '.1') ⇒ DataFrame #join(other, join_keys, type: :inner, suffix: '.1') ⇒ DataFrame #join(other, join_key_pairs, type: :inner, suffix: '.1') ⇒ DataFrame

Note:

the order of joined results may not preserved. Use additional index column to sort after joining.

Overloads:

  • #join(other, type: :inner, suffix: '.1') ⇒ DataFrame

    If ‘join_key` is not specified, common keys in self and other are used (natural keys). Returns joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)
    df.join(other)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    
    df.join(other, type: :full_outer)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 C              3 (nil)
    3 D          (nil) (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • type (:left_semi, :right_semi, :left_anti, :right_anti, :inner, left_outer, :right_outer, :full_outer) (defaults to: :inner)

      type of join.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #join(other, join_keys, type: :inner, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df3 = DataFrame.new(
      KEY1: %w[A B C],
      KEY2: [1, 2, 3]
    )
    #=>
      KEY1        KEY2
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other3 = DataFrame.new(
      KEY1: %w[A B D],
      KEY2: [1, 4, 5]
    )
    #=>
      KEY1        KEY2
      <string> <uint8>
    0 A              1
    1 B              4
    2 D              5

    join keys in an Array

    df3.join(other3, [:KEY1, :KEY2])
    #=>
      KEY1        KEY2
      <string> <uint8>
    0 A              1

    partial join key and suffix

    df3.join(other3, :KEY1, suffix: '.a')
    #=>
      KEY1        KEY2  KEY2.a
      <string> <uint8> <uint8>
    0 A              1       1
    1 B              2       4

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_keys (String, Symbol, Array<String, Symbol>)

      A key or keys to match.

    • type (:left_semi, :right_semi, :left_anti, :right_anti, :inner, left_outer, :right_outer, :full_outer) (defaults to: :inner)

      type of join.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #join(other, join_key_pairs, type: :inner, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df4 = DataFrame.new(
      X1: %w[A B C],
      Y: %w[D E F]
    )
    #=>
      X1       Y1
      <string> <string>
    0 A        D
    1 B        E
    2 C        F
    
    other4 = DataFrame.new(
      X2: %w[A B D],
      Y:  %w[e E E]
    )
    #=>
      X1       Y1
      <string> <string>
    0 A        D
    1 B        E
    2 C        F

    without options

    df4.join(other4)
    #=>
      X1       Y        X2
      <string> <string> <string>
    0 B        E        D
    1 B        E        B

    join by key pairs

    df4.join(other4, { left: [:X1, :Y], right: [:X2, :Y] })
    #=>
      X1       Y
      <string> <string>
    0 B        E

    join by key pairs, using renaming by suffix

    df4.join(other4, { left: :X1, right: :X2 })
    #=>
      X1       Y        Y.1
      <string> <string> <string>
    0 A        D        e
    1 B        E        E

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_key_pairs (Hash)

      Pairs of a key name or key names to match in left and right.

    • type (:left_semi, :right_semi, :left_anti, :right_anti, :inner, left_outer, :right_outer, :full_outer) (defaults to: :inner)

      type of join.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Options Hash (join_key_pairs):

    • :left (String, Symbol, Array<String, Symbol>)

      Join keys in ‘self`.

    • :right (String, Symbol, Array<String, Symbol>)

      Join keys in ‘other`.

    Returns:



724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
# File 'lib/red_amber/data_frame_combinable.rb', line 724

def join(other, join_keys = nil, type: :inner, suffix: '.1')
  case other
  when DataFrame
    other = other.table
  when Arrow::Table
    # Nop
  else
    raise DataFrameArgumentError, 'other must be a DataFrame or an Arrow::Table'
  end

  table_keys = table.keys
  other_keys = other.keys
  type = type.to_sym

  # natural keys (implicit common keys)
  join_keys ||= table_keys.intersection(other_keys)

  # This is not necessary if additional procedure is contributed to Red Arrow.
  if join_keys.is_a?(Hash)
    left_keys = join_keys[:left]
    right_keys = join_keys[:right]
  else
    left_keys = join_keys
    right_keys = join_keys
  end
  left_keys = Array(left_keys).map(&:to_s)
  right_keys = Array(right_keys).map(&:to_s)

  case type
  when :full_outer, :left_semi, :left_anti, :right_semi, :right_anti
    left_outputs = nil
    right_outputs = nil
  when :inner, :left_outer
    left_outputs = table_keys
    right_outputs = other_keys - right_keys
  when :right_outer
    left_outputs = table_keys - left_keys
    right_outputs = other_keys
  end

  # Should we rescue errors in Arrow::Table#join for usability ?
  joined_table =
    table.join(other, join_keys,
               type: type,
               left_outputs: left_outputs,
               right_outputs: right_outputs)

  case type
  when :inner, :left_outer, :left_semi, :left_anti, :right_semi, :right_anti
    if joined_table.keys.uniq!
      DataFrame.create(rename_table(joined_table, n_keys, suffix))
    else
      DataFrame.create(joined_table)
    end
  when :full_outer
    renamed_table = rename_table(joined_table, n_keys, suffix)
    renamed_keys = renamed_table.keys
    dropper = []
    DataFrame.create(renamed_table).assign do |df|
      left_keys.map do |left_key|
        i_left_key = renamed_keys.index(left_key)
        right_key = renamed_keys[i_left_key + table_keys.size]
        dropper << right_key
        [left_key.to_sym, merge_array(df[left_key].data, df[right_key].data)]
      end
    end.drop(dropper)
  when :right_outer
    if joined_table.keys.uniq!
      DataFrame.create(rename_table(joined_table, left_outputs.size, suffix))
    else
      DataFrame.create(joined_table)
    end.pick do
      [right_keys, keys.map(&:to_s) - right_keys]
    end
  end
end

#left_join(other, suffix: '.1') ⇒ DataFrame #left_join(other, join_keys, suffix: '.1') ⇒ DataFrame #left_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

Join matching values to self from other.

  • Same as ‘#join` with `type: :left_outer`

  • A kind of mutating join.

Overloads:

  • #left_join(other, suffix: '.1') ⇒ DataFrame

    If ‘join_key` is not specified, common keys in self and other are used (natural keys). Returns joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    without key (use implicit common key)

    df.left_join(other)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 C              3 (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #left_join(other, join_keys, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with a key

    df.left_join(other, :KEY)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 C              3 (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_keys (String, Symbol, Array<String, Symbol>)

      A key or keys to match.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #left_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df2 = DataFrame.new(KEY1: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY1          X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other2 = DataFrame.new(KEY2: %w[A B D], X2: [true, false, nil])
    #=>
      KEY2     X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with key pairs

    df2.left_join(other2, { left: :KEY1, right: :KEY2 })
    #=>
      KEY1          X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 C              3 (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_key_pairs (Hash)

      Pairs of a key name or key names to match in left and right.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Options Hash (join_key_pairs):

    • :left (String, Symbol, Array<String, Symbol>)

      Join keys in ‘self`.

    • :right (String, Symbol, Array<String, Symbol>)

      Join keys in ‘other`.

    Returns:



360
361
362
# File 'lib/red_amber/data_frame_combinable.rb', line 360

def left_join(other, join_keys = nil, suffix: '.1')
  join(other, join_keys, type: :left_outer, suffix: suffix)
end

#merge(*other) ⇒ DataFrame Also known as: bind_cols

Note:

the ‘#size` must be same as `other#size`.

Note:

self and other must not share the same key.

Merge other DataFrames or Tables.

Examples:

df    = DataFrame.new(x: [1, 2], y: [3, 4])
other = DataFrame.new(a: ['A', 'B'], b: ['C', 'D'])
df.merge(other)
#=>
        x       y a        b
  <uint8> <uint8> <string> <string>
0       1       3 A        C
1       2       4 B        D

Parameters:

  • other (DataFrame, Arrow::Table, Array<DataFrame, Arrow::Table>)

    DataFrames or Tables to merge.

Returns:

Raises:



79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/red_amber/data_frame_combinable.rb', line 79

def merge(*other)
  case other
  in [] | [nil] | [[]]
    return self
  in [Array => array]
    # Nop
  else
    array = other
  end

  hash = array.each_with_object({}) do |e, h|
    df =
      case e
      when Arrow::Table
        DataFrame.create(e)
      when DataFrame
        e
      else
        raise DataFrameArgumentError, "#{e} is not a Table or a DataFrame"
      end

    if size != df.size
      raise DataFrameArgumentError, "#{e} do not have same size as self"
    end

    k = keys.intersection(df.keys).any?
    raise DataFrameArgumentError, "There are some shared keys: #{k}" if k

    h.merge!(df.to_h)
  end

  assign(hash)
end

#right_join(other, suffix: '.1') ⇒ DataFrame #right_join(other, join_keys, suffix: '.1') ⇒ DataFrame #right_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

Join matching values from self to other.

  • Same as ‘#join` with `type: :right_outer`

  • A kind of mutating join.

Overloads:

  • #right_join(other, suffix: '.1') ⇒ DataFrame

    If ‘join_key` is not specified, common keys in self and other are used (natural keys). Returns joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    without key (use implicit common key)

    df.right_join(other)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 D          (nil) (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #right_join(other, join_keys, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with a key

    df.right_join(other, :KEY)
    #=>
      KEY           X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 D          (nil) (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_keys (String, Symbol, Array<String, Symbol>)

      A key or keys to match.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #right_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df2 = DataFrame.new(KEY1: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY1          X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other2 = DataFrame.new(KEY2: %w[A B D], X2: [true, false, nil])
    #=>
      KEY2     X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with key pairs

    df2.right_join(other2, { left: :KEY1, right: :KEY2 })
    #=>
      KEY1          X1 X2
      <string> <uint8> <boolean>
    0 A              1 true
    1 B              2 false
    2 D          (nil) (nil)

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_key_pairs (Hash)

      Pairs of a key name or key names to match in left and right.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Options Hash (join_key_pairs):

    • :left (String, Symbol, Array<String, Symbol>)

      Join keys in ‘self`.

    • :right (String, Symbol, Array<String, Symbol>)

      Join keys in ‘other`.

    Returns:



414
415
416
# File 'lib/red_amber/data_frame_combinable.rb', line 414

def right_join(other, join_keys = nil, suffix: '.1')
  join(other, join_keys, type: :right_outer, suffix: suffix)
end

#semi_join(other, suffix: '.1') ⇒ DataFrame #semi_join(other, join_keys, suffix: '.1') ⇒ DataFrame #semi_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

Return records of self that have a match in other.

  • Same as ‘#join` with `type: :left_semi`

  • A kind of filtering join.

Overloads:

  • #semi_join(other, suffix: '.1') ⇒ DataFrame

    If ‘join_key` is not specified, common keys in self and other are used (natural keys). Returns joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    without key (use implicit common key)

    df.semi_join(other)
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #semi_join(other, join_keys, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df = DataFrame.new(KEY: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other = DataFrame.new(KEY: %w[A B D], X2: [true, false, nil])
    #=>
      KEY      X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with a key

    df.semi_join(other, :KEY)
    #=>
      KEY           X1
      <string> <uint8>
    0 A              1
    1 B              2

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_keys (String, Symbol, Array<String, Symbol>)

      A key or keys to match.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Returns:

  • #semi_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame

    Returns Joined dataframe.

    Examples:

    df2 = DataFrame.new(KEY1: %w[A B C], X1: [1, 2, 3])
    #=>
      KEY1          X1
      <string> <uint8>
    0 A              1
    1 B              2
    2 C              3
    
    other2 = DataFrame.new(KEY2: %w[A B D], X2: [true, false, nil])
    #=>
      KEY2     X2
      <string> <boolean>
    0 A        true
    1 B        false
    2 D        (nil)

    with key pairs

    df2.semi_join(other2, { left: :KEY1, right: :KEY2 })
    #=>
      KEY1          X1
      <string> <uint8>
    0 A              1
    1 B              2

    Parameters:

    • other (DataFrame, Arrow::Table)

      A DataFrame or a Table to be joined with self.

    • join_key_pairs (Hash)

      Pairs of a key name or key names to match in left and right.

    • suffix (#succ) (defaults to: '.1')

      a suffix to rename keys when key names conflict as a result of join. ‘suffix` must be responsible to `#succ`.

    Options Hash (join_key_pairs):

    • :left (String, Symbol, Array<String, Symbol>)

      Join keys in ‘self`.

    • :right (String, Symbol, Array<String, Symbol>)

      Join keys in ‘other`.

    Returns:



467
468
469
# File 'lib/red_amber/data_frame_combinable.rb', line 467

def semi_join(other, join_keys = nil, suffix: '.1')
  join(other, join_keys, type: :left_semi, suffix: suffix)
end

#set_operable?(other) ⇒ Boolean

Check if set operation with self and other is possible.

Examples:

df3 = DataFrame.new(
  KEY1: %w[A B C],
  KEY2: [1, 2, 3]
)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              2
2 C              3

other3 = DataFrame.new(
  KEY1: %w[A B D],
  KEY2: [1, 4, 5]
)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              4
2 D              5
df3.set_operable?(other3) #=> true

Parameters:

  • other (DataFrame, Arrow::Table)

    A DataFrame or a Table to be joined with self.

Returns:

  • (Boolean)

    true if set operation is possible.



529
530
531
# File 'lib/red_amber/data_frame_combinable.rb', line 529

def set_operable?(other) # rubocop:disable Naming/AccessorMethodName
  keys == other.keys.map(&:to_sym)
end

#union(other) ⇒ DataFrame

Select records appearing in self or other.

  • Same as ‘#join` with `type: :full_outer` when keys in self are same with other.

  • A kind of set operations.

Examples:

df3 = DataFrame.new(
  KEY1: %w[A B C],
  KEY2: [1, 2, 3]
)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              2
2 C              3

other3 = DataFrame.new(
  KEY1: %w[A B D],
  KEY2: [1, 4, 5]
)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              4
2 D              5
df3.intersect(other3)
#=>
  KEY1        KEY2
  <string> <uint8>
0 A              1
1 B              2
2 C              3
3 B              4
4 D              5

Parameters:

  • other (DataFrame, Arrow::Table)

    A DataFrame or a Table to be joined with self.

Returns:



573
574
575
576
577
578
579
# File 'lib/red_amber/data_frame_combinable.rb', line 573

def union(other)
  unless keys == other.keys.map(&:to_sym)
    raise DataFrameArgumentError, 'keys are not same with self and other'
  end

  join(other, keys, type: :full_outer)
end