python - Unicode Encode error when writing to CSV after scraping -


I am learning web scraping, and to practice, I get a table of baseball figures from Derek Jeter I am trying to scrape. Using the beautiful soup, I was able to extract it like a table: import from bs4 BeautifulSoup import urllib2 jeter = "http://www.baseball-reference.com/players/j / Jeterde01- bat.shtml "Page = urllib2.urlopen (jeter) soup = beautiful soup (page) table = soup.find ('table', id = 'batting_standard') #The list of header names tableheaders = table.find_all (' Th ') Header = [] I = 0 while i

table_rows ends with a list, where each element is another list representing each row of the table:

  table_rows = [['a', 'b' , 'c'], when I try to write it in CSV using this code:  

Import CSV

  open ('/ home / russell / Desktop / python learning / web scraping / jeter.csv ',' wb ') Fp: a = csv.writer (fp) #write Header for CSV file: awriterow (headers) for e in table_ro: a.writerow (e)   

I get the following error message :

  Traceback (most recent call final): File "/ home / Russell / Document / Python Learning / Web Scrapping / jeter_scrape2.py", line 39, & lt; Module & gt; A.writerow (e) UnicodeEncodeError: The 'ascii' codec can not encode characters in position 4-5: not in serial number (128)   

I'm new to all this , So the information you can provide will be greatly appreciated.

Note: I am using Python 2.7

Edit:

The line that is hanging it is : " U'127, U'203 ', U'25', 'U' 8, U '19', 'U'84', 'U'30', 'You' 6, You '57', 'You' 11 ',' U ' 4.34, U '3 84, U'481', '864', 'U'127', 'You' 301, U'13 ',' You '5, You' 3 ', You' 3 ',' You '1 , U '6']

I assume that this is the first object slash? (U'1998 \ xa0 \ u2605 '). It is considered a year - 1998. The CSV writer can first process some lines, but it is hung on it. The problem is that the HTML table you are scraping has a U + 2605 black star. You can see it clearly on the front page of "1999". You can easily remove it to remove those bits from the first column: table_rows for e: e [0] = e [0] .replace (u '\ xa0 \ U2605 ', u' ') a.writerow (e)

something like that

Comments

Popular posts from this blog

php - PDO bindParam() fatal error -

logging - How can I log both the Request.InputStream and Response.OutputStream traffic in my ASP.NET MVC3 Application for specific Actions? -

java - Why my included JSP file won't get processed correctly? -