python - Selecting the text contents in a particular that has another within it using Scrapy and Xpath -


EDIT: Solution! Those who encounter this in their learning; The answer is jealous, well explained and provided by Paul.

This is my first question and I have searched and searched (for two days) to no avail. I am trying to scour a special retail website to get product names and prices.

Currently, I have a spider on a special retail website, however, with another retail website, it works, I can get the name of the product correctly, but I can get the right format Can not get the price in First of all, this is my spider code: scrap item import project from projectname import item class spider_spacer: name = "whatever" permission_domain = [ "Domain.com"] start_urls = ["http://www.domain.com"] DRF pars (self, response): sel = Scrapy.Selector (response) request = sel.xpath ('// div @ class = "Container"] ') product = requests.xpath (' .//*[@ class = "product name" / text ()] '.extract () price = requests.xpath (' .//*[@ class = "Price"] '). Remove () #Issue is located here. Product [= item] [value]] = value. [Item] = [] product, value in zip (product, price): item = project item (item) ['product'] = product,

The target HTML for the price is now:

  & lt; Div id = "listprice1" class = "price" & gt; $ 622 & lt; Div class = "cents" & gt; .00 & lt; / Div & gt; & Lt; / Div & gt;   

As you can see, not only is this messy, there is a div within that div that I want to refer to. Now when I try and try:

  value = requests.xpath ('.//**@ class = "price"] / text ()'). Remove it ()   

It spits it out:

  product, value some_product1, $ 100 some_product2, some_product3, $ 200 some_product4,   
  product, value some_product1, $ 100 some_product2, $ 200 some_product3, $ 300 some_product4, $ 400   < P> What do I think is doing; It is also removing div square = "cents" and assigns the next product so that the next value can be pressed from one to the other.  

When I try and scrap the data through Google Docs spreadsheets, it keeps the product in one column, and the price is divided into two columns; The first amount is $ amount and the second, as shown below, is .00 cents:

  product, value, cents some_product1, $ 100, .00 some_product2, $ 200, .00 some_product3, $ 300 ,. 00 some_product4, $ 400, .00   

So my question is, how can I separate div in the div? Is there any special way to get it out of XPath or I parse data Can I filter this time? And if I can filter it, how do I do this?

Any help is greatly appreciated. Please understand, I expect that I am new to Python and I am doing my best to know.

Various XPath patterns:

  & gt; & Gt; & Gt; Import scrap & gt; & Gt; & Gt; Selector = scrapy.Selector (text = "" "& lt; div id =" listprice1 "class =" price "& gt; ... $ 622 & quot; div class =" cents "& gt; .00 & lt; / Div & gt; ... & lt; / div & gt; "" ") # / Text () will not select all the text nodes under reference; # Any element with the class" value "# 2 and gt; & Gt; & Gt; Selector.xpath ('.//*@ class = "price"] / text ()'). Remove () [u '\ n $ 622', u '\ n'] # If you wrap the reference node inside the "string ()" function, # you will get the string rendering of the node, # basically one of the text elements Combination & gt; & Gt; & Gt; Selector.xpath ('string (.//*@ class = "price"])'). Extract () [u '\ n $ 622 .00 \ n'] # will be used instead of "normalize-space ()" instead of "string ()", # it will replace many places with 1 space character and gt; & Gt; & Gt; Selector.xpath ('normalize-space (.//*@ class = "price"])'). Remove () [u '$ 622 .00'] # You can also ask for 1 text node under the element "value" & gt; & Gt; & Gt; Selector.xpath ('.//*@ class = "price"] / text () [1]'). Remove () [u '\ n $ 622'] #Space-normalized version you want & gt; & Gt; & Gt; Selector.xpath ('normalize-space (.//*@ class = "price"] / text () [1])'). Remove () [u '$ 622']> gt; & Gt; & Gt; Finally, you can be following this method:  
  DEFFress (auto, response): sel = scrapy.Selector (feedback) Items in the request = sel.xpath ('// div @ class = "container"]') items list = [] requests: item = project item () item ['product'] = rxpath '(' normal-location ( .com: // * [@ class = "productname"]) '). Remove () item ['price'] = r.xpath ('normalize-space (.//*@ class = "price"] / text () [1])'). Remove () item list. Ampaid (item) return items list    

Comments

Popular posts from this blog

php - PDO bindParam() fatal error -

logging - How can I log both the Request.InputStream and Response.OutputStream traffic in my ASP.NET MVC3 Application for specific Actions? -

java - Why my included JSP file won't get processed correctly? -