Saya sedang menulis laba-laba untuk mengekstrak teks dan hyperlink yang sesuai dari halaman web. Ini kode laba-laba saya:

import scrapy

class GeneralElection2019Spider(scrapy.Spider):
    name = 'general_election_2019'
    allowed_domains = ['https://eci.gov.in/']
    start_urls = ['https://eci.gov.in/files/category/1359-general-election-2019/']

    def parse(self, response):
        print(f'\nProcessing: {response.url}\n')
        #data = response.css('.ipsType_break.ipsContained a::attr(href)').extract() # Hyperlink
        data = response.css('.ipsType_break.ipsContained a::attr(title)').extract() # Text
        for row in data:
            print(f'{row}\n')

Saya bisa mendapatkan teks atau hyperlink tetapi saya ingin keduanya sekaligus.

0
Harsh Wardhan 11 Mei 2021, 21:36

1 menjawab

Jawaban Terbaik

Anda dapat mencoba seperti ini

import scrapy
    class GeneralElection2019Spider(scrapy.Spider):
        name = 'general_election_2019'
        allowed_domains = ['eci.gov.in']
        start_urls = ['https://eci.gov.in/files/category/1359-general-election-2019/']
    
        def parse(self, response):
            print(f'\nProcessing: {response.url}\n')
            for data in response.css('li.ipsDataItem'):
                text = data.css('span.ipsType_break.ipsContained a::attr(title)').get()
                text2 = data.css('span.ipsType_break.ipsContained a::text').get()
                link = data.css('span.ipsType_break.ipsContained a::attr(href)').get()
                print(text)
                print(text2)
                print(link)
1
Samsul Islam 12 Mei 2021, 09:52