I have two DataFrames,
df2, structured as follows:
ip_address property_A 188.8.131.52 AAA 184.108.40.206 BBB 220.127.116.11 CCC ... ... 18.104.22.168.255 ZZZ ip_address property_B 22.214.171.124 YRG 126.96.36.199 HJK 188.8.131.52 KJH ... ... 184.108.40.206.255 TYU
And I want to merge them on the column "ip_address". Due to the nature of the data contained in that column, this command is failing:
pd.merge(df1, df2, on='ip_address', how='inner') >> dtype: object does not appear to be an IPv4 or IPv6 address
A possible solution would be to convert IP addresses to integers using the
ipaddress module as in this example:
import ipaddress int(ipaddress.IPv4Address('192.168.0.1')) >> 3232235521
To do this efficiently, I tried this command:
import numpy as np import pandas as pd df1['int_ip'] = np.nan df1.int_ip = int(ipaddress.IPv4Address(df1.ip_address))
However, even this command is failing:
>> AddressValueError: Expected 4 octets in [...]
The only approach that seems to be feasible is the following:
for i in range(0, df1.shape): df1.int_ip[i] = int(ipaddress.IPv4Address(df1.ip_address[i]))
But this one is extremely inefficient.
Do you have a better approach?