Scrape youtube href using jsoup

Refresh

April 2019

Views

226 time

1

I'm using jsoup in java and I'm trying to scrape the first href in a particular youtube video search. However, I can't figure out the correct css query in order to obtain the href. If someone can point me in the correct direction, that'd be great. Here is the image of the html I'm trying to scrape on youtube.

The following is one of the selects I've tried, but doesn't print out anything.

My code:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import java.io.IOException;

public class WebTest
{
    public static void main(String[] args)
    {
        try {
            Document doc = Jsoup.connect("https://www.youtube.com/results?search_query=childish+gambino+this+is+america").get();
            Elements musicVideoLink = doc.select("h3.title-and-badge.style-scope.ytd-video-renderer a[href]");

            String linkh = musicVideoLink.attr("href");
            System.out.println(linkh);
        }
        catch (IOException ex){ }
    }
}
gio

1 answers

0

With the JSoup.connect().get(), because there are no other headers in the request like User-Agent, YouTube returns quite a basic HTML rendering of the search results. This is quite different from the structure in the linked image above, but actually easier to select in though:

Elements musicVideoLink = doc.select("h3.yt-lockup-title a");

This looks like the easiest solution here. If you do pass in the User-Agent header, you get back the same as the Network tab in the browser inspector shows, but this doesn't yet match the result in that image. The browser clearly does a bit of AJAX style processing and rendering on that response next.