[操作疑難] Python requests問題

小弟寫左個小program用黎爬蟲
咁每個小program 都會連proxy之後爬蟲
咁我同時開100個以上,每個都連唔同既proxy,100個打後既全部連唔到/慢(等於連唔到)
咁係咩問題呢,點可以解決呢

(program 冇用thread,multitask,multithread)
行windows

小弟寫左個小program用黎爬蟲
咁每個小program 都會連proxy之後爬蟲
咁我同時開100個以上,每個都連唔同既p ...
geffon 發表於 2018-11-29 06:15


如不是proxy的問題, 可以試下用gevent

TOP

小弟寫左個小program用黎爬蟲
咁每個小program 都會連proxy之後爬蟲
咁我同時開100個以上,每個都連唔同既p ...
geffon 發表於 2018-11-29 06:15



    The reason why you cannot connect to the targets or slowed down is that you are running out of resources.  Meanwhile, I wonder how you open 100 programs to do the same thing instead of using threading.

TOP

python multi process/thread 好食resources,用library 做single thread single line 去攞Data 有時重快,
開新connection係極慢,要reuse 返之前用左既connection

TOP

回覆 4# super_hkg


認同,之前用個 Lib 搞 export mails from exchange ,佢 default 行 multi-threads ,扯成幾 GB RAM 黎做,兼且唔係快。最後改咗佢用 single thread 又順咗、又隱定咗。

TOP

The reason why you cannot connect to the targets or slowed down is that you are running out o ...
samiux 發表於 2018-11-29 08:05



原因應該係咁了,全部每次開到100個以上,CPU 使用率都100%...
試了Gevent,直接CPU負荷唔到,完全操作唔到野
都唔知有咩方法解決

試左轉做GOLANG寫,完全冇問題,而且CPU 使用率都係得幾%,同樣既CODE
但係GOLANG 啲LIBRARY 實在太少了,冇PYTHON咁方便,好麻煩

有冇人知 Java 爬蟲既話 食唔食 CPU ? 會唔會比Python 好好多?
都係 httpclient 一直 requests 野咁樣
因為 Java Library 比較多

TOP

提示: 作者被禁止或刪除 內容自動屏蔽

TOP

python multi process/thread 好食resources,用library 做single thread single line 去攞Data 有時重快, ...
super_hkg 發表於 2018-11-29 09:53


Python doesn't really have thread because of the famous GIL, multi process is achieved by forking (if on Linux), or through low level library, which is definitely not as flexible as pure Python

TOP

原因應該係咁了,全部每次開到100個以上,CPU 使用率都100%...
試了Gevent,直接CPU負荷唔到,完全操作 ...
geffon 發表於 2018-11-30 07:31


Golang should be very feasible though. The idea of Golang is not to have many official libraries, but everything can be git-based. You can pretty much import stuffs from (kind of) url. When you build it, Go compiler will clone the repository, so there should be a lot of community support.

TOP

先試10個, 慢慢加上去

TOP