How do I behave responsibly when fetching a URL provided by user?


April 2019


56 time


What problems am I likely to face / what should I consider? I'm starting from a point of rate-limiting per user, maybe overall, possibly per domain. I guess I'll parse_url(), make sure I set reasonable timeouts, etc. Is there a big class of security hole I need to watch out for?

1 answers


Is there a big class of security hole I need to watch out for?

yes! idk if it's big, but for example: you probably don't want to accept file:// url's (for example, would you want to accept file:///etc/passwd ? probably not.), also you probably don't want to accept email protocols, like imap / ldap / pop3 / rtmp / etc, do you? to be safe, i suggest making a protocol whitelist, and reject any url that doesn't contain a whitelisted protocol (and default to http for url's where no protocol is specified because that's what libcurl does by default, or flat out reject them), for example:

    $errors[]="illegal protocol. legal protocols: ".implode(", ",$protocolWhiteList);
  • that way, only white-listed protocols are allowed, and you'll probably avoid some security issues that way.

further, you may not want to allow users to use your curl code to communicate with severs on the same LAN or same VPN or even localhost (which may help hackers bypass firewalls/routers/access VPNs?), so i'd probably also deny those url's, eg

if ($host === "localhost" || ((false !== filter_var ( $host, FILTER_VALIDATE_IP )) && (false === filter_var ( $host, FILTER_VALIDATE_IP, FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE )))) {
    $errors [] = "localhost and LAN IP URLs are not allowed.";
  • that way it'll get much more difficult for hackers to use your curl system to bypass firewalls/NAT's/whatever to access system-local (only listening on or LAN-local or VPN-local services..

.. and there could very well be more stuff you should account for, but this is everything i could think of, off the top of my hat..