Questions tagged [jsoup]

0

votes
0

answer
4

Views

Scraping web with java and downloading a video

I'm trying to scraping this 9gag link I tried using JSoup to get this HTML tag for taking the source link and download the video directly. I tried with this code public static void main(String[] args) throws IOException { Response response= Jsoup.connect('https://9gag.com/gag/a2ZG6Yd') .ignoreConte...
tassi4224
0

votes
0

answer
9

Views

Getting past a login page in order to webscrape using JSoup

Sorry in advance if I am going about this completely wrong, as I am very new to jSoup. I am trying to login to my school's grade website in order scrape grade data from it and display it in a program. The link to the login portal is 'https://portal.mcpsmd.org/public/' however the page I am trying to...
James McGreivy
1

votes
1

answer
1.8k

Views

Extract link of background, jsoup

I have a problem with extract link from HTML of the following nature, using jsoup.
jumper0k
1

votes
0

answer
183

Views

how to get web page content consecutively without refresh url in web view android?

i working on android application with web view there is one page and Url is same but below content change consecutively how i do check content of web page in web view which is change and i can catch a successful message kindly help me i searched a whole day but i did not find solution. webview = (We...
Meerz
1

votes
0

answer
53

Views

jsoup parsing google.com search returns different results on win / linux

I have the following code, running on my local Win 10 machine within Eclipse (java 7) and then deployed to a Red Hat server as a servlet running on Tomcat 7: Document doc = Jsoup.connect('https://www.google.com/search?q=best+hotel+chicago&start=1' + i).ignoreHttpErrors(true).referrer('https://www.go...
JamesJ
1

votes
0

answer
83

Views

Jsoup Login to avoid 401 HttpStatusException

I am making a program for use at my work that extracts lead info from our HTML lead screen. In order to do so, I am using Jsoup to parse the HTML document and search for the specified information. When I go to pull up the page through my web browser like normal, it prompts me with a login before act...
Brandon Woodruff
1

votes
1

answer
93

Views

JSoup not showing all the data

Here is my jsoup code: Document document = Jsoup.connect('https://www.aliexpress.com/category/200214036/women-watches.html?spm=2114.search0103.3.7.765d221bi3J3Io&site=glo&g=y').get(); Elements titleElement = document.select('div.item > div.img > div.pic > a.picRind > img'); String essay = essayEleme...
Zishan
1

votes
2

answer
30

Views

Jsoup - cannot select table

I am trying to print out the rows from the table on 'https://www.worldcoinindex.com/'. From Chrome's inspect element I can see that the tableID='myTable'. However, when I try to perform the select table method, it returns an Index out of bounds exception. I am able to print individual rows using...
John Stevens
1

votes
0

answer
48

Views

Finding a specific value on a page with XSoup

I've used JSoup a bit before but this is my first time trying XSoup. I'm building a program that tracks the accuracy of this price analysis tool for Bitcoin (https://www.tradingview.com/symbols/BTCUSD/technicals/). The end goal is to grab what this site is recommending (Buy, strong buy, sell, strong...
Moist Von Lipvig
1

votes
1

answer
119

Views

Java connection fails from jar but works from Eclipse

I have a Java multithread application that connects to the web. After I start some number of threads, if I run it from an executable jar, the application hangs. If I run it from Eclipse it works perfectly. This is the instruction that connects to the web: Jsoup.connect(url).header('Cache-Control', '...
Luke
1

votes
0

answer
178

Views

How to hide header , footer and other tags with Webview in Android?

I have used Jsoup for hide html tags (header, search panel, footer) . All work fine. But when i click any ithem, in new page will show footer and header again. How to i can hide header and footer for all operations ? Thanks for before :) public class MainActivity extends AppCompatActivity { WebVie...
Fuad Alizadeh
1

votes
1

answer
134

Views

Jsoup different results Windows & Linux

I have written a function for parsing HTML from a specific URL using Jsoup(1.11.2) see code below. I have some OS issues: On Windows 10 it is working perfectly fine, but I won't get the full content when executing on Linux. Can someone explain why I get different results? public Document getJsoup(...
wolteeer
1

votes
1

answer
51

Views

Changing values on web page before scraping it

I'm trying to scrape the following page using JSoup: https://basketballmonster.com/PlayerRankings.aspx But before scraping the page, I'd like to change the value of 'Past Games' to 5, and select 'All Players' instead of 'Top Players'. I've been able to scrape plenty of pages with JSoup without issue...
dajadf
1

votes
0

answer
93

Views

In Jsoup , Why getElementById() returning null object and the given HTML string printing with twice with syntax error when .outerHTML() called?

I am developing an application using Java which reads html tags. I am using JSOUP library to read HTML tags My Problem I created an Element tag by giving a html content as string .. The html tag; Sathish Kumar Yes. Let develop this. Like · Reply · 5 · 5 mins The HTML string object public st...
Lord Commander
1

votes
1

answer
106

Views

I cannot log in to a website in Android using Jsoup

I am using Jsoup to try to log in to this website. I am using the following code: Connection.Response res = Jsoup.connect('https://www.interpals.net/') .userAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36') .method(Connection...
David
1

votes
0

answer
29

Views

jsoup - Would it be possible to get the parent page of a page?

I am capturing website content using a crawler that is composed of jsoup. Would it be possible for me to get the parent page of a certain page? I am talking about the hierarchical structure of a website. See image: Website Hierarchical Structure For example, 'About Company''s parent is 'Homepage'....
lei
1

votes
1

answer
94

Views

Jsoup is not stripping escaped html characters

I have a standard json structure inside which I have content like this This is html content And I am using jsoup to strip the tags.However I am getting output as below This is html content Jsoup is not able to strip the end tags which is having escaped characters. Note:A standard json data format wi...
Hemanth
1

votes
0

answer
21

Views

How does RSS readers fetch various field values, Tag names varies with different domain and does not follow a standard set of keys?

I am trying to implement a RSS reader and when I try to fetch values of like image,URL,pubdate,guid,description,content,etc., these values comes in different key tags, and doesn’t have a standard set of keys. Currently I use Switch case with few common keys to get values, even though there are som...
Arivarasan Ram
1

votes
0

answer
32

Views

Dynamically loaded content scraping

I'm trying to create simple site scraper using Java and jsoup or htmlunit. I have chosen vk.com as a target site. My goal is to go through the audio tracks and download them. I started investigation and figured out that there is no URL of tracks in DOM. Track is somehow dynamically downloaded after...
Paul Luchin
1

votes
1

answer
83

Views

HTML parsing Android Jsoup

im kinda new to android, im trying to use jsoup to parse a html page to gather some info from a page. i would like to insert a url via pop-up (altertbox) usinga method called loadwebsite: private void loadWebsite(){ AlertDialog.Builder builder = new AlertDialog.Builder(this); builder.setTitle('Inser...
Daniele Asta
1

votes
1

answer
39

Views

Getting Data from multiple a tags in HTML

I am scraping a medical website where I need to extract header wise information regarding a drug e.g Precautions, Contraindications,Dosage, Uses etc. The HTML data looks like below. If I just extract info using the tag p.drug-content I get content under all the headers as one big paragraph. How do I...
serendipity
1

votes
1

answer
372

Views

Jsoup Parsing double quotes as " and single quotes as double quotes

I am trying to parse an HTML document. In the document, there is the span-data-personalization = '{'one':['two']}' which converts to span-data-personalization = '{"one":["two"]}' while parsing. The double quotes convert to " and single quotes to double quote. I have also used...
Shashank S
1

votes
1

answer
141

Views

How to get the last 5 articles from a website with Jsoup

I'm working currently on a java desktop app for a company and they ask me, to extract the 5 last articles from a web page and and to display them in the app. To do this I need a html parser of course and I thought directly about JSoup. But my problem is how do i do it exactly? I found one easy examp...
Laz22434
1

votes
2

answer
166

Views

Jsoup crawler and HTTP error fetching URL

I am writing a crawler with Jsoup and this is the HTTP error I get: org.jsoup.HttpStatusException: HTTP error fetching URL. Status=404, URL=https://www.mkyong.com/spring-boot/spring-boot-hibernate-search-example/%E2%80%9Chttp:/wildfly.org/downloads/ at org.jsoup.helper.HttpConnection$Response.execut...
Anna Noukou
1

votes
0

answer
94

Views

fetch Latest version of android application running in playstore to ask for update

I have been implementing android app update for user b detecting latest application running in playstore using jsoup library. But lately its been giving some crashes in my apps. SO had to remove it. So was wondering if anyone can shed some light into how the same thing can be achieved using differen...
Kushal Mehta
1

votes
0

answer
63

Views

Android http request not working with WiFi but working on mobile data

I'm having trouble getting a response from a http request when the phone is connected via WiFi. When it's using mobile data everything works fine. Here is the code: doc = Jsoup.connect('https://www.facebook.com/') .userAgent('Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 F...
Andrei Nutu
1

votes
3

answer
38

Views

Jsoup Html parsing query

im a newb with Jsoup, i would like parse this code: (27 apr 2018 19:17:55 CEST) in order to get : 27 apr 2018 19:17:55 CEST any tips?
Daniele Asta
1

votes
1

answer
174

Views

Understanding User-Agents when scraping URL data using JSoup

My basic goal is to create something like a typical web-unfurl project for web links. For that, I am using JSoup. For the User-Agent, I was initially using the string - Mozilla/5.0 (Macintosh; U; Intel Mac OS X; de-de) AppleWebKit/523.10.3 (KHTML, like Gecko) Version/3.0.4 Safari/523.10 for the fol...
gravetii
1

votes
0

answer
733

Views

Web scrapping with Jsoup in Kotlin

I am trying to scrape this website as part of my lesson to learn about kotlin and web scrapping (using jsoup). What I am trying to scrape is to scrape is the Jackpot $1,000,000 est. values. The below code was something that I wrote after searching and checking out a couple of tutorials online, but...
jake wong
1

votes
0

answer
413

Views

How to read a brotli compressed string?

I'm getting a brotli compressed json string from a website. I want to decompress and read it. When I use input stream from response, I'm able to read it properly using new BufferedReader(new InputStreamReader(new BrotliInputStream(response.getEntity().getContent()))); Whereas when I have saved the r...
1

votes
1

answer
226

Views

Scrape youtube href using jsoup

I'm using jsoup in java and I'm trying to scrape the first href in a particular youtube video search. However, I can't figure out the correct css query in order to obtain the href. If someone can point me in the correct direction, that'd be great. Here is the image of the html I'm trying to scrape o...
gio
1

votes
0

answer
65

Views

JSoup match exact text with space in it

I'd like to select only an exact String(that has spaces in it) using Jsoup. I'm using matches(^String with spaces$) but the selector is matching with text in the dom that has the world 'String' and not exactly 'String with spaces'. For example: Alzheimer's symptom Alzheimer's caregivers should make...
ihossain
1

votes
0

answer
63

Views

Android - Extract html with tag and make count on result

I got a JSON object that return in a long string with html content, is there anyway i can get the certain text from this html string and assign put it into textview? what i want to get would probably the content under & and throw all others away. \r\n\r\n \r\n \r\n Latest Deals\r\n...
Shawn.Y
1

votes
1

answer
36

Views

ASPX page with spaces in URL cannot be read

I have a problem with spaces in the URL of an .aspx page. I replace the spaces with replace(' ','%20') When connecting via Jsoup.connect(URL).get() or HttpURLConnection urlConn = (HttpURLConnection) URL.openConnection() I get the following error: Server returned HTTP response code: 400 for URL: ht...
carwah
1

votes
0

answer
44

Views

Dividing the texts on Textview

Dividing the texts on Textview in order to be opened in a different modal when clicking on individual sentences. Etc: according to full stop. Text in Textview was pulled over via jsoup. I draw data with jsoup and I combine the texts into paragraphs. I want information about that statement to be pri...
Emre Aydemir
1

votes
1

answer
191

Views

Nutch login to website for crawling

I need to crawl post of a website https://hl.com using nutch but this websites ask for login on certain pages. Like for profile and certain posts. So i need to first authenticate, i tried with below code but its not working am getting a blank html. String url='https://hl.com/user/Joanne74'; Connecti...
kumst
1

votes
1

answer
145

Views

How to put paging with JSoup

I am working with JSoup and I was able to list the data of a website in a RecyclerView. But I'd like to put a page in it. I followed the tutorial below, but it only shows the first few pages. Anyone know how I can do it? Tutorial: http://www.yudiz.com/pagination-data-scraping-in-android-using-jsoupj...
OliverDamon
1

votes
0

answer
214

Views

JSoup - unable to tunnel through authenticated proxy

I'm in trouble with similar issue described at Unable to tunnel through proxy - Jsoup. Trying to connect with test page I'm getting such a exeption: Exception in thread 'main' java.io.IOException: Unable to tunnel through proxy. Proxy returns 'HTTP/1.1 407 Proxy Authentication Required' My ProxyAuth...
lo0p3r
1

votes
1

answer
108

Views

Parsing XHTML with Jsoup 1.11

I am trying to parse a XHTML file with Jsoup and its stripping the closing slash on some of my tags. ie: becomes I have tried some of the other answers here: Jsoup: How to convert a String containing HTML to a XHTML document? https://github.com/jhy/jsoup/issues/511 jsoup: differnt result after upda...
Al Grant
1

votes
2

answer
388

Views

How to turn of the SSL certificate and get Login response using JSOUP/Rest Assured Api?

*When i tried turning off the ssl certificate validation in Postman,i got the response but iam not sure how to turn off the ssl certificate validation through code I have used below code (JSOUP) and i am getting 'javax.net.ssl.SSLHandshakeException:' public String Login () throws Exception { Conne...
qsg testing

View additional questions