python - Selecting the text contents in a particular that has another within it using Scrapy and Xpath -
EDIT: Solution! Those who encounter this in their learning; The answer is jealous, well explained and provided by Paul.
This is my first question and I have searched and searched (for two days) to no avail. I am trying to scour a special retail website to get product names and prices.
Currently, I have a spider on a special retail website, however, with another retail website, it works, I can get the name of the product correctly, but I can get the right format Can not get the price in First of all, this is my spider code: The target HTML for the price is now: As you can see, not only is this messy, there is a div within that div that I want to refer to. Now when I try and try: It spits it out: When I try and scrap the data through Google Docs spreadsheets, it keeps the product in one column, and the price is divided into two columns; The first amount is $ amount and the second, as shown below, is .00 cents: So my question is, how can I separate div in the div? Is there any special way to get it out of XPath or I parse data Can I filter this time? And if I can filter it, how do I do this? Any help is greatly appreciated. Please understand, I expect that I am new to Python and I am doing my best to know. Various XPath patterns:
& lt; Div id = "listprice1" class = "price" & gt; $ 622 & lt; Div class = "cents" & gt; .00 & lt; / Div & gt; & Lt; / Div & gt;
value = requests.xpath ('.//**@ class = "price"] / text ()'). Remove it ()
product, value some_product1, $ 100 some_product2, some_product3, $ 200 some_product4,
product, value some_product1, $ 100 some_product2, $ 200 some_product3, $ 300 some_product4, $ 400 < P> What do I think is doing; It is also removing div square = "cents" and assigns the next product so that the next value can be pressed from one to the other.
product, value, cents some_product1, $ 100, .00 some_product2, $ 200, .00 some_product3, $ 300 ,. 00 some_product4, $ 400, .00
& gt; & Gt; & Gt; Import scrap & gt; & Gt; & Gt; Selector = scrapy.Selector (text = "" "& lt; div id =" listprice1 "class =" price "& gt; ... $ 622 & quot; div class =" cents "& gt; .00 & lt; / Div & gt; ... & lt; / div & gt; "" ") # / Text () will not select all the text nodes under reference; # Any element with the class" value "# 2 and gt; & Gt; & Gt; Selector.xpath ('.//*@ class = "price"] / text ()'). Remove () [u '\ n $ 622', u '\ n'] # If you wrap the reference node inside the "string ()" function, # you will get the string rendering of the node, # basically one of the text elements Combination & gt; & Gt; & Gt; Selector.xpath ('string (.//*@ class = "price"])'). Extract () [u '\ n $ 622 .00 \ n'] # will be used instead of "normalize-space ()" instead of "string ()", # it will replace many places with 1 space character and gt; & Gt; & Gt; Selector.xpath ('normalize-space (.//*@ class = "price"])'). Remove () [u '$ 622 .00'] # You can also ask for 1 text node under the element "value" & gt; & Gt; & Gt; Selector.xpath ('.//*@ class = "price"] / text () [1]'). Remove () [u '\ n $ 622'] #Space-normalized version you want & gt; & Gt; & Gt; Selector.xpath ('normalize-space (.//*@ class = "price"] / text () [1])'). Remove () [u '$ 622']> gt; & Gt; & Gt; Finally, you can be following this method:
DEFFress (auto, response): sel = scrapy.Selector (feedback) Items in the request = sel.xpath ('// div @ class = "container"]') items list = [] requests: item = project item () item ['product'] = rxpath '(' normal-location ( .com: // * [@ class = "productname"]) '). Remove () item ['price'] = r.xpath ('normalize-space (.//*@ class = "price"] / text () [1])'). Remove () item list. Ampaid (item) return items list
Comments
Post a Comment