Saya mencoba mencari situs web SEC untuk menemukan kemunculan pertama "10-Q" atau "10-K", dan mengambil tautan yang ditemukan di bawah "Tombol Data Interaktif" di situs web.

Url yang saya coba ambil tautannya adalah:

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=AAPL&type=&dateb=200506&owner=exclude&count=40

Tautan hasil harus:

https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v

Kode yang saya gunakan saat ini:

import requests
from bs4 import BeautifulSoup

date1 = "20200506"
ticker = "AAPL"

URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=' + ticker + '&type=&dateb=' + 
date1 + '&owner=exclude&count=40'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find(id='seriesDiv')

rows = results.find_all('tr')

for row in rows:
    document = row.find('td', string='10-Q')
    link = row.find('a', id="interactiveDataBtn")
    if None in (document, link):
        continue
    print(document.text)
    print(link['href'])

Kode ini mengembalikan semua tautan 10-Q, tetapi seharusnya untuk 10-Q dan 10-K.

Dapatkah seseorang membantu saya membentuk kode ini sehingga hanya mengembalikan tautan kemunculan pertama 10-Q atau 10-K?

Terima kasih

1
Jackey12345 2 Juli 2020, 20:52

1 menjawab

Jawaban Terbaik

Solusi tercepat adalah menggunakan lambda dalam metode .find().

Sebagai contoh:

import requests
from bs4 import BeautifulSoup

date1 = "20200506"
ticker = "AAPL"

URL = 'https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=' + ticker + '&type=&dateb=' + date1 + '&owner=exclude&count=40'
page = requests.get(URL)

soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find(id='seriesDiv')
rows = results.find_all('tr')

for row in rows:
    document = row.find(lambda t: t.name=='td' and ('10-Q' in t.text or '10-K' in t.text))
    link = row.find('a', id="interactiveDataBtn")
    if None in (document, link):
        continue
    print(document.text)
    print('https://www.sec.gov' + link['href'])

Mencetak tautan 10-Q dan 10-K:

10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000010&xbrl_type=v
10-K
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000119&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000076&xbrl_type=v
10-Q
https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000066&xbrl_type=v

EDIT: Untuk mendapatkan kemunculan pertama saja, Anda dapat menggunakan kamus. Setiap iterasi memeriksa apakah ada kunci (string 10-Q atau 10-K) di dalam kamus dan jika tidak, tambahkan:

links = dict()
for row in rows:
    document = row.find(lambda t: t.name=='td' and ('10-Q' in t.text or '10-K' in t.text))
    link = row.find('a', id="interactiveDataBtn")
    if None in (document, link):
        continue
    if document.text not in links:
        links[document.text] = 'https://www.sec.gov' + link['href']

print(links)

Cetakan:

{'10-Q': 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000052&xbrl_type=v', 
 '10-K': 'https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-19-000119&xbrl_type=v'}
1
Andrej Kesely 2 Juli 2020, 18:27