Board logo

標題: [操作疑難] Python requests問題 [打印本頁]

作者: geffon    時間: 2018-11-29 06:15     標題: Python requests問題

小弟寫左個小program用黎爬蟲
咁每個小program 都會連proxy之後爬蟲
咁我同時開100個以上,每個都連唔同既proxy,100個打後既全部連唔到/慢(等於連唔到)
咁係咩問題呢,點可以解決呢

(program 冇用thread,multitask,multithread)
行windows
作者: sonichkhk    時間: 2018-11-29 07:10

小弟寫左個小program用黎爬蟲
咁每個小program 都會連proxy之後爬蟲
咁我同時開100個以上,每個都連唔同既p ...
geffon 發表於 2018-11-29 06:15


如不是proxy的問題, 可以試下用gevent
作者: samiux    時間: 2018-11-29 08:05

小弟寫左個小program用黎爬蟲
咁每個小program 都會連proxy之後爬蟲
咁我同時開100個以上,每個都連唔同既p ...
geffon 發表於 2018-11-29 06:15



    The reason why you cannot connect to the targets or slowed down is that you are running out of resources.  Meanwhile, I wonder how you open 100 programs to do the same thing instead of using threading.
作者: super_hkg    時間: 2018-11-29 09:53

python multi process/thread 好食resources,用library 做single thread single line 去攞Data 有時重快,
開新connection係極慢,要reuse 返之前用左既connection
作者: 燒浩    時間: 2018-11-29 10:49

回覆 4# super_hkg


認同,之前用個 Lib 搞 export mails from exchange ,佢 default 行 multi-threads ,扯成幾 GB RAM 黎做,兼且唔係快。最後改咗佢用 single thread 又順咗、又隱定咗。
作者: geffon    時間: 2018-11-30 07:31

The reason why you cannot connect to the targets or slowed down is that you are running out o ...
samiux 發表於 2018-11-29 08:05



原因應該係咁了,全部每次開到100個以上,CPU 使用率都100%...
試了Gevent,直接CPU負荷唔到,完全操作唔到野
都唔知有咩方法解決

試左轉做GOLANG寫,完全冇問題,而且CPU 使用率都係得幾%,同樣既CODE
但係GOLANG 啲LIBRARY 實在太少了,冇PYTHON咁方便,好麻煩

有冇人知 Java 爬蟲既話 食唔食 CPU ? 會唔會比Python 好好多?
都係 httpclient 一直 requests 野咁樣
因為 Java Library 比較多
作者: takayo72    時間: 2018-12-1 08:21

提示: 作者被禁止或刪除 內容自動屏蔽
作者: ip4368    時間: 2018-12-18 18:24

python multi process/thread 好食resources,用library 做single thread single line 去攞Data 有時重快, ...
super_hkg 發表於 2018-11-29 09:53


Python doesn't really have thread because of the famous GIL, multi process is achieved by forking (if on Linux), or through low level library, which is definitely not as flexible as pure Python
作者: ip4368    時間: 2018-12-18 18:27

原因應該係咁了,全部每次開到100個以上,CPU 使用率都100%...
試了Gevent,直接CPU負荷唔到,完全操作 ...
geffon 發表於 2018-11-30 07:31


Golang should be very feasible though. The idea of Golang is not to have many official libraries, but everything can be git-based. You can pretty much import stuffs from (kind of) url. When you build it, Go compiler will clone the repository, so there should be a lot of community support.
作者: chocostang    時間: 2018-12-19 11:16

先試10個, 慢慢加上去





歡迎光臨 電腦領域 HKEPC Hardware (https://h1.hkepc.com/forum/) Powered by Discuz! 7.2