Merge two DataFrames on IP address column

Refresh

December 2018

Views

39 time

3

I have two DataFrames, df1 and df2, structured as follows:

ip_address            property_A
1.1.1.1               AAA
1.2.2.2               BBB
1.3.3.3               CCC
...                   ...
1.255.255.255.255     ZZZ

ip_address            property_B
1.1.1.1               YRG
1.2.2.2               HJK
1.3.3.3               KJH
...                   ...
1.255.255.255.255     TYU

And I want to merge them on the column "ip_address". Due to the nature of the data contained in that column, this command is failing:

pd.merge(df1, df2, on='ip_address', how='inner')

>> dtype: object does not appear to be an IPv4 or IPv6 address

A possible solution would be to convert IP addresses to integers using the ipaddress module as in this example:

import ipaddress
int(ipaddress.IPv4Address('192.168.0.1'))

>> 3232235521

To do this efficiently, I tried this command:

import numpy as np
import pandas as pd
df1['int_ip'] = np.nan
df1.int_ip = int(ipaddress.IPv4Address(df1.ip_address))

However, even this command is failing:

>> AddressValueError: Expected 4 octets in [...]

The only approach that seems to be feasible is the following:

for i in range(0, df1.shape[0]):
    df1.int_ip[i] = int(ipaddress.IPv4Address(df1.ip_address[i]))

But this one is extremely inefficient.

Do you have a better approach?

1 answers

1
d = {'ip_address': ['1.1.1.1', '2.2.2.2','3.3.3.3','1.255.255.255'], 'property_A': ['AAA','BBB','CCC','ZZZ']}
df1 = pd.DataFrame(data=d)
b = {'ip_address': ['1.1.1.1', '2.2.2.2','3.3.3.3','1.255.255.255'], 'property_B': ['YRG','HJK','KJH','TYU']}
df2 = pd.DataFrame(data=b)

Я хотел бы попробовать это:

df3= df1.merge(df2.set_index('ip_address'),
               left_on=df1.ip_address,
               right_index=True)

df1
    ip_address    property_A
0   1.1.1.1       AAA
1   2.2.2.2       BBB
2   3.3.3.3       CCC
3   1.255.255.255 ZZZ

df2    
    ip_address    property_B
0   1.1.1.1       YRG
1   2.2.2.2       HJK
2   3.3.3.3       KJH
3   1.255.255.255 TYU

df3
    ip_address    property_A    property_B
0   1.1.1.1       AAA           YRG
1   2.2.2.2       BBB           HJK
2   3.3.3.3       CCC           KJH
3   1.255.255.255 ZZZ           TYU