Questions tagged [beautifulsoup]

1

votes
1

answer
50

Views

how to save the urls from for loop into a single variable?

I want to store the multiple urls into a single variable 'URLs'. Those URLs is made up from three parts,'urlp1' ,'n'and 'urlp2', which you can see in the code below. urlp1 = 'https://www.proteinatlas.org/' URLs = [] for cancer in cancer_list: urlp2 = '/pathology/tissue/' + cancer[1] f = cancer[0] t...
Rujun Guan
1

votes
2

answer
73

Views

Web scraping tables using python

I have been trying to extract a table from Wikipedia list of noble laureates .The table has some none value I don't know how to take care of those values.while looping through the cells How can I include the none values in the table. The link to the Wikipedia page is :https://en.wikipedia.org/wiki/L...
Aamir
2

votes
1

answer
16

Views

How do I implement a breadth first and depth first search web crawler?

I am attempting to write a web crawler in Python with Beautiful Soup in order to crawl a webpage for all of the links. After I obtain all the links on the main page, I am trying to implement a depth-first and breadth-first search to find 100 additional links. Currently, I have scraped and obtained t...
dacoda007
1

votes
0

answer
556

Views

Beautiful Soup prettify format only string values

I am using Beautiful Soup 4 to parse and modify a couple of Angular templates (HTML files). I have some issues when using the prettify function to write the modified content back into the file. This issue is related to special characters such as: >,
Tudor Ciotlos
1

votes
1

answer
206

Views

Python extract empty tag with beautifulsoup

I have the following loop, which works to extract particular tags and input them into a .csv file for all files in a directory. However, some files have empty tags and I obtain the following error Traceback (most recent call last): File 'newsbank2.py', line 27, in author = fauthor.text Attrib...
Amanda Maull
1

votes
1

answer
327

Views

Extract number from a website using beautifulsoup?

The following python code: from bs4 import BeautifulSoup div = '查看: 15660|回复: 435' soup = BeautifulSoup(div, 'lxml') hm = soup.find('div', {'class': 'hm'}) print(hm) The output that i want two number in this case: 15660 435 I want to try to extract the numbers from the website using beautifu...
wen tian
1

votes
1

answer
34

Views

Python Scraper - Find Data in Column

I am working on my first website scraper and am trying to get the number 41,110 that is saved in a column on the webpage https://mcassessor.maricopa.gov/mcs.php?q=14014003N. Below is my code. How can I get to this number and print it? from bs4 import BeautifulSoup import requests web_page = 'https...
Taylor29
1

votes
0

answer
48

Views

BeautifulSoup4 Not detecting Select tag

I am trying to learn web scrapping, I need to get the all the select tags in the web page. But they are not getting detected by BeautifulSoup4. Here is my code in python 3 from urllib.request import Request, urlopen from bs4 import BeautifulSoup import lxml req = Request('https://www.nseindia.com/'...
Abhishek Gangadhar
1

votes
1

answer
72

Views

Trouble targeting a certain HTML tag with Python, requests, and BeautifulSoup

I'm writing an app in Python using requests and BeautifulSoup and have encountered a problem finding the text of a specific element. Essentially you enter a zip code and it requests a Bing search (Bing has easier to use search query URLs than Google) for '[zip code] weather'. I'm able to pull the...
joon_bug
1

votes
0

answer
2.3k

Views

Discord does not embed link when sent by my bot

My code works fine and the bot sends the link, but Discord does not recognize it as one and does not embed it. When I copy and paste it myself, it then recognizes it as a link and embed the image. Here is my code: if message.content.startswith('.dog'): response = requests.get('https://dog.ceo/api/br...
Mark W
1

votes
1

answer
105

Views

Unable to run BeautifulSoup in Python Virtual Environment

I've installed BeatifulSoup4 in my virtual environment as shown below. I'm able to import and use it normally inside my virtual environment interpreter; however, the interpreter couldn't find BeatifulSoup when I run the script directly. I haven't installed BeatifulSoup in my native environment. H...
dreamzboy
1

votes
1

answer
195

Views

Scraping data from table with unique IDs

I am trying to scrape from this website. My objective is to collect the most recent 10 results (win/loss/draw) of ANY team, I am just using this specific team as an example. The source for an individual row is: Sat 13/01/18 PRL Tottenham Hotspur 4 - 0 Everton View events More info You can see in the...
jonh98
1

votes
2

answer
95

Views

Data Extraction using Beautiful Soup : Data Visible on Website But No Text or Value present in HTML Tags

I am trying to extract data from a website with HTML I am unable to extract text from HTML.I am using Python,Selenium and Beautiful SOUP to extract data.I checked from jquery using CSS Selector. How to select value using python as it is working in jquery
Abhishek Choudhary
1

votes
2

answer
366

Views

Getting individual data from a <li> as text using BeautifulSoup Python

This is a section of the HTML file i want to get data from. 2002 (02 reg) Hatchback 115,000 miles Manual 1.8L 123 bhp Petrol This is how i extracted the from the rest of the document soup = BeautifulSoup(page.content, 'html.parser') vehicle_details = soup.find_all('ul', class_='listing-key-specs')...
stackoverflow1234
1

votes
0

answer
99

Views

HTML split on a given character

so I am using beautiful soup to read the html of a page. req = urllib.request.Request('https://en.wikipedia.org/wiki/Barack_Obama', headers = headers) html = urllib.request.urlopen(reqx) page = BeautifulSoup(html,'html.parser') I want to split the html code on period on the condition that it does no...
Kiran Baktha
1

votes
0

answer
36

Views

How to specify the link I want to download the file in python3

I found several questions related to my question but none helped me. I have to download .hdf files for a few years. This year I have at least 3 files per month. I tried to loop in the shell, but it will take 1000 years until I can download all the files, outside that will block my band. When I acces...
Lucas Fagundes
1

votes
1

answer
421

Views

Beautifulsoup scraping table from website with requests for pandas

I am trying to download the data on this website https://coinmunity.co/ ...in order to manipulate later it in Python or Pandas I have tried to do it directly to Pandas via Requests, but did not work, using this code: res = requests.get('https://coinmunity.co/') soup = BeautifulSoup(res.content, 'lxm...
skeitel
1

votes
2

answer
792

Views

Parse “<tbody> / <tr> / <td>” with python's BeautifulSoup

I have the following HTML code: 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa 62e907b15cbf27d5425399ebf6f0fb50ebb88f18 66.67711246 BTC 66.67711246 BTC 1089 12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX 119b098e2e980a229e139a9ed01a469e518e6f26 50.05723154 BTC 50.05723154 BTC 55 I want to parse it to get something like...
bsteo
1

votes
1

answer
68

Views

Scrapy or Beautifoulsoup for a custom scraper?

I need guidance in developing a scraper. I need to build a custom scraper that retrieves all products from 3 e-commerce websites. I built the PoC scraper with Scrapy, however, there is a flow with this scraper: The scraper needs to crawl a given category up to scrape depth level 3 in order to reach...
Chris
1

votes
0

answer
169

Views

Using FileStorage objects in Flask to read non utf encoded XML files

Warning: This is not purely a code question, is about knowing 'why' something actually works. I am trying to use Flask to parse an XML file the user imports using a form without needing to save it anywhere on the server. After reading this thread: Read file data without saving it in Flask I've under...
comendeiro
1

votes
1

answer
32

Views

How do I Access Specfic Elements of Webpage for Import into Pandas

I have this code that scrapes a website for menu information. I have got it working so that it gets the text from this week menu items: #Weekly Breakfast Menu import requests from bs4 import BeautifulSoup page = requests.get('https://trinity.campusdish.com/Commerce/Catalog/Menus.aspx?LocationId=103...
Christopher Reid
1

votes
1

answer
33

Views

Beautifulsoup: extracting links from a file that links were already taken from

I'm trying to write a web searching algorithm and on my first time through the site, I call beautifulsoup on it. I then use find_all on it and it returns a list of 'a' class. Within this a class, there is a collection of data, but i'm trying to create of list of the URLs. Here's my code: soupcurren...
Ryan
1

votes
0

answer
69

Views

Scraping a website for specific data where URLs are inconsistent

I want to scrape http://www.narrpr.com/ for data, but I'm running into an issue. Most of the time, formatting URLs to access the specific pages you want to scrape is easy. However, in this instance, the URLs are formatted in the following fashion (for example): http://www.narrpr.com/homes/mo/indepen...
dougdimmadome
1

votes
1

answer
58

Views

Remove text outside of tags with bs4

I want to delete the text Página consultada el: but I don't know how because it's outside any tag. I've tried with this but nothing changes: for b in soup.find('br'): if( b.nextSibling == 'Página consultada el:'): b.nextSibling.replaceWith('') if(b.previousSibling == 'Página consultada el:'): b....
Shai Lèger
1

votes
0

answer
49

Views

How to find script in HTML?

I am trying to retrieve the contents of the following tag and turn it into a json object:
wit221
1

votes
2

answer
292

Views

BeautifulSoup error regarding running any prograam

When I am running any program, I get the error as : C:\Python36\lib\site-packages\bs4__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available html parser for this system ('lxml'). This usually isn't a problem, but if you run this code on another system, or in...
Priyanshu Vaya
1

votes
2

answer
305

Views

Python: BeautifulSoup not always getting all text data

i've got a strange problem with my webscraper. I am trying to get the data from a website using BeautifulSoup. My code works on 90% of all links i've tried out but on a few it does not read the page fully. The text that intrests me is '1152x864' When checking the soure code on my browser i clearly...
Erik
1

votes
1

answer
177

Views

Remove blanks from beautifulsoup get_text

I'm piecing together some stuff to try and get clean text from a website using beautifulsoup get_text. In the past I have found it often comes back with some stuff that isn't what I needed so I've set about trying to make it as clean as possible. My issue is, in what is returned I get some blank v...
rlou
1

votes
0

answer
56

Views

Parsing HTML Tables with Python

Below is the code. I've also attached a photo of the table here. The issue I am having is tr[1] has the column headers or th tags, then tr[1:40] has row headers with th tags followed by td and th tags that correspond to the numbers in the table. td are normal numbers, but th are different because th...
Josh Pilson
1

votes
2

answer
163

Views

Cannot scrape a table using BeautifulSoup

From the code below: I only managed to get 1 row of data url = 'http://investmentmoats.com/DividendScreener/DividendScreener.php' res = requests.get(url) soup = BeautifulSoup(res.content,'lxml') table = soup.find_all('table')[0] df = pd.read_html(str(table))[0] Can someone help?
lrh09
1

votes
1

answer
145

Views

Using BeautifulSoup to Match string in a html document and highlight it where ever it appears

I am trying to match a string in a HTML document and highlight it particularly. I have used BeautifulSoup along with html.parser. What i have tried so far is using find_all() and passing the string to be matched but it doesnt help as it returns the whole text present in the element. I want you to g...
subodhkalika
1

votes
1

answer
109

Views

How can I parse the table content from the website using Selenium?

I'm trying to parse the tables present in sports website into list of dictionary to render into template, this is my first exposure to selenium, I tried to read selenium documentation and wrote this program from bs4 import BeautifulSoup import time from selenium import webdriver url = 'http://www.e...
steve
1

votes
1

answer
154

Views

Openstreetcam extract image and gps location

I would like to use python to download the image and sequences of images found in the location on openstreetcam. http://openstreetcam.com/details/8552/422 I figured out the image is saved under http://api.openstreetcam.org/files/photo/2016/6/30/lth/8552_2fbf0_57756eba868e9.jpg?v=1518090956232 howev...
Benedict K.
1

votes
1

answer
38

Views

How to make a CSV file for a webscrapper?

I have a list of tuples that I need to put into a CSV file using pandas but not sure how. I was thinking of putting them in a dic but that didn't work. Here is the list I am trying to get into a CSV. tables = soup.find_all('div', {'class':'pane'})[0].find_all('table') if (len(tables) > 4): product_...
user9269112
1

votes
0

answer
19

Views

Search tags by text which contain other tags

The following code shows that BeautifulSoup.find(text=) will not search for tags which contain some other tags. I am not sure if there is a way to do so. Could anybody let me know whether there a way to perform such a search? Thanks. $ cat main.py #!/usr/bin/env python # vim: set noexpandtab tabsto...
user1424739
1

votes
2

answer
214

Views

BeautifulSoup: Saving changes back to HTML

I have the following Python code: content = webpage.content soup = Soup(content, 'html.parser') app_url = scheme + app_identity.get_default_version_hostname() + '/' for link in soup.find_all(href = True): if scheme in link['href']: link['href'] = link['href'].replace(scheme, app_url) logging.info('@...
T145
1

votes
0

answer
58

Views

find_all() function of BeautifulSoup doesn't return anything in cron server, but it does when run locally

I am struggling to find out why the find_all function of beautifulsoup doesn't work in cronjob- but my code works perfectly when run manually in my local machine. The print elements statement shows nothing which shouldn't be the case. This might be something important. I am using python2.7 and I hop...
user1111
1

votes
1

answer
30

Views

PhantomJS not retrieving correct data

I am trying to scrape a web page which has javascript in it using phantomjs. I found an element for button and when i click it, it show render next link. But i am not getting the exact output what i want. Instead, i am getting different output which is not required. The code is: from bs4 import Beau...
Thedeadman619
1

votes
2

answer
100

Views

Scraping data with Python and BeautifulSoup - can't extract div attribute content

I've been trying to extract some data from a website using Python and BeautifulSoup. I can't seem to find a way to extract the content of the div attributes. For example, from this: I'd like to extract the title and get the result: b I tried with this: for all_data in soup.find_all('div', {'class'...
Nina Mikulin
1

votes
1

answer
69

Views

Shopify liquid pattern matching using Python

I would like to add a new line to shopify liquid file (basically it's an HTML file with some variables) after I find a pattern I look for in the liquid string. As this is a valid HTML file by saying after I mean add a new line as a new element in the html in a way that doesn't break the layout of th...
embedded

View additional questions