python - Selecting the text contents in a particular that has another within it using Scrapy and Xpath -

- May 15, 2014

EDIT: Solution! Those who encounter this in their learning; The answer is jealous, well explained and provided by Paul.

This is my first question and I have searched and searched (for two days) to no avail. I am trying to scour a special retail website to get product names and prices.

Currently, I have a spider on a special retail website, however, with another retail website, it works, I can get the name of the product correctly, but I can get the right format Can not get the price in First of all, this is my spider code: scrap item import project from projectname import item class spider_spacer: name = "whatever" permission_domain = [ "Domain.com"] start_urls = ["http://www.domain.com"] DRF pars (self, response): sel = Scrapy.Selector (response) request = sel.xpath ('// div @ class = "Container"] ') product = requests.xpath (' .//*[@ class = "product name" / text ()] '.extract () price = requests.xpath (' .//*[@ class = "Price"] '). Remove () #Issue is located here. Product [= item] [value]] = value. [Item] = [] product, value in zip (product, price): item = project item (item) ['product'] = product,

The target HTML for the price is now:

  & lt; Div id = "listprice1" class = "price" & gt; $ 622 & lt; Div class = "cents" & gt; .00 & lt; / Div & gt; & Lt; / Div & gt;    As you can see, not only is this messy, there is a div within that div that I want to refer to. Now when I try and try:  
  value = requests.xpath ('.//**@ class = "price"] / text ()'). Remove it ()    It spits it out:  
  product, value some_product1, $ 100 some_product2, some_product3, $ 200 some_product4,     product, value some_product1, $ 100 some_product2, $ 200 some_product3, $ 300 some_product4, $ 400   < P> What do I think is doing; It is also removing div square = "cents" and assigns the next product so that the next value can be pressed from one to the other.   When I try and scrap the data through Google Docs spreadsheets, it keeps the product in one column, and the price is divided into two columns; The first amount is $ amount and the second, as shown below, is .00 cents:  
  product, value, cents some_product1, $ 100, .00 some_product2, $ 200, .00 some_product3, $ 300 ,. 00 some_product4, $ 400, .00    So my question is, how can I separate div in the div? Is there any special way to get it out of XPath or I parse data Can I filter this time? And if I can filter it, how do I do this?  
 Any help is greatly appreciated. Please understand, I expect that I am new to Python and I am doing my best to know.   
 
  Various XPath patterns:  
  & gt; & Gt; & Gt; Import scrap & gt; & Gt; & Gt; Selector = scrapy.Selector (text = "" "& lt; div id =" listprice1 "class =" price "& gt; ... $ 622 & quot; div class =" cents "& gt; .00 & lt; / Div & gt; ... & lt; / div & gt; "" ") # / Text () will not select all the text nodes under reference; # Any element with the class" value "# 2 and gt; & Gt; & Gt; Selector.xpath ('.//*@ class = "price"] / text ()'). Remove () [u '\ n $ 622', u '\ n'] # If you wrap the reference node inside the "string ()" function, # you will get the string rendering of the node, # basically one of the text elements Combination & gt; & Gt; & Gt; Selector.xpath ('string (.//*@ class = "price"])'). Extract () [u '\ n $ 622 .00 \ n'] # will be used instead of "normalize-space ()" instead of "string ()", # it will replace many places with 1 space character and gt; & Gt; & Gt; Selector.xpath ('normalize-space (.//*@ class = "price"])'). Remove () [u '$ 622 .00'] # You can also ask for 1 text node under the element "value" & gt; & Gt; & Gt; Selector.xpath ('.//*@ class = "price"] / text () [1]'). Remove () [u '\ n $ 622'] #Space-normalized version you want & gt; & Gt; & Gt; Selector.xpath ('normalize-space (.//*@ class = "price"] / text () [1])'). Remove () [u '$ 622']> gt; & Gt; & Gt; Finally, you can be following this method:    DEFFress (auto, response): sel = scrapy.Selector (feedback) Items in the request = sel.xpath ('// div @ class = "container"]') items list = [] requests: item = project item () item ['product'] = rxpath '(' normal-location ( .com: // * [@ class = "productname"]) '). Remove () item ['price'] = r.xpath ('normalize-space (.//*@ class = "price"] / text () [1])'). Remove () item list. Ampaid (item) return items list    

 




  



















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




php - PDO bindParam() fatal error -



-



September 15, 2015








I am trying to learn PDO, now I have written this small code, but it gives me a serious error:        Fatal error:  Call a member function at a non-object dot () ...    $ con = new mysqli ("127.0.0.1", "root", "", "csvdangercheck"); $ Query = $ con- & gt; ("Id,` var1`, `var2`) values (" id,: var1,: var2) ";); $ query- & gt; bypass pattern (': id', $ id); $ query- & gt; BindParam (': var1', $ VAL1); $ Query- & gt; bindParam (': var2', $ val2); $ Query-> Executed ();    I Trying to use  print_r ($ con-> errorInfo ())  but    Fatal error:  Undefined method mysqli :: errorInfo ( ) ... can someone tell me what am I missing?      Flaffeh said, you have PDO Missing Mysqy, try it:    $ con = new PDO ('mysql: host = 127.0.0.1; dbname = csvdangercheck', 'root', '');     





Read more





php - How can I cram 6+31 numeric characters into 22 alphanumeric
characters? -



-



March 15, 2015








    I have a 6 digit number and a 31-digit number (such as "234536" and "201103231043330478311223582826") that I The API needs to be crawled in the same 22-character alphanumeric field that uses PHP. I tried to convert each one to 32 (using custom code could not handle the  base_convert ()  large number) and to connect with a single-character delimiter, but It only gives me 26 characters This is a restore API, so the letters should be URI-protected.   I would like to do a database table cross with two references without reference to another reference value, if possible, any suggestions?      Instead use a radix of 62. This will give you 22 characters for the top, 3.35 for the pre and 17.3 for the next.    & gt; & Gt; & Gt; Math.log (10 ** 6) /math.log (62) 3.3474826039165504 & gt; & Gt; & Gt; Math.log (10 ** 31) /math.log (62) 17.295326786902177     





Read more





mysql - where clause in inner join query -



-



August 15, 2015








    Please help me about this question:    Look for functions ($ userpno) } {Echo $ userpno; $ This-> query = "SELECT task.employee_id, task.user_id, task.service_id, service.name AS servicename, service.description AS servicedescription, employee.name as employineame, employee.pic_path employment, employee.pic_path from work Work.user_id = '$ userpno' Join the INNER employee. Pno = task.employee_id Join INNER user user.pno = task.user_id service.service_id = task.service_id "INNER service; }    query works perfectly without:    WHERE task.user_id = '$ userpno'    my Also try this method:    WHERE task.user_id = $ userpno    but it does not work.   Error:    Warning: mysqli_fetch_array () parameter 1 is expected to be mysqli_result, given in Boolean C: \ wamp \ www \ admin \ classes \ Task Php on line 22    Please tell me where I say where is it.      Try:    $ this-> query = "SELECT Task.employee_id, task.user_id, task.service_id, service.name AS s...





Read more

Search This Blog

linux

python - Selecting the text contents in a particular that has another within it using Scrapy and Xpath -

Comments

Post a Comment

Popular posts from this blog

php - PDO bindParam() fatal error -

php - How can I cram 6+31 numeric characters into 22 alphanumeric characters? -

mysql - where clause in inner join query -