Making forum scraper with several stages with BrowserAutomationStudio


1 836 просмотров.

Краткое описание видео: .hi in this tutorial I will show you how.to make rows of automation Studios group.we have several stages each stage have a.different thread number for example the.first stage will have a run in single to.add mod and the second stage will run in.10 threads so the scribbs is I will I.shall make is side scrubber it will.scrub pages from forum and also scrub is.content content of every page so let’s.start let’s start by loading the form.URL this it and you see that this page.content contains list of topics first in.a first stage.I shall parse all the topics let’s see.the page markup you see that each topic.has each topic stars from slash topic.URL so let’s use that to actually scrub.it I shall use Tycho to do that and most.automation studio has several type of.selectors for this time I shall use mesh.mesh allows to allows to curate the.content and HTML markup thirdly.okay now when I have loop I need to.[Music].actually extract that URL from the.element and to do that I’ll use the get.element attribute and add an attribute.that I need is craft and I’ll change the.variable name and to see if the loop.works correctly I will output and that.URL variable to the log it displaced or.because I didn’t actually start the loop.and I don’t have your L bar book but if.I move at the tuition point here and.start the loop I’ll feed there some.results in log okay now I did the first.stage it works in one thread and works.pretty well.but I need to to give data data to.second stage and the status can.communicate can exchange data between.each buzzer by using global variables or.using resources this time I will use.resource so I’ll I’ll create a resource.which is called post before before the.first action and after and in the loop.I’ll place the URL inside that results.so together with logging I will do add.element I will use at element action in.church age and choose the answers which.I just created and add data okay let’s.test it again now I have the URLs in my.resource and I’m ready to do the stage.to making it stages with you see that.button several cubes and it calls the.cofunction in several thread action but.to use it.I need to function first and software.says that not no function is found so.let’s create one Crayton function is.easy to it is done with this button add.function I will name it scrap topic.content okay first that I need to do.inside the function is to actually plot.the topic URL because I saved topics.inside resource on the first page I can.get it from the source.on the second stage and I use what.option to load page inside browser right.now right now it gives her role because.the URL inside the source is a relative.form.and it’s pretty easy to translate it to.absolute form so I’ll just add it Lord.action and add domain to the drill.producing now you see that that euro has.been loaded and they think that I want.to parse is that topic content so I just.click eTI’s and used action which is.called get element text okay I need to.select it somehow I can leave that mass.action and it will work but I want to.show you how the CSS selectors works the.CSS selectors gives several options.right now you see that the best option.is there this selector it selects.element by CSS class which is called.content and obviously it will give there.the test with which I deserve at any.page I will rename URL to get it more.meaningful and let’s check if I open the.vault inspector I’ll see that the post.variable contains the actual post.okay now we need to save this data.somehow roaster automation studio has.different wave to saving data the most.convenient would be probably the.database but for dashing proposals I’ll.use the easiest one just output it to.log okay that’s it.now I have created the stage to and the.only thing is which is missing is.actually call the stage two so again.I’ll open click on that button with.cubes and it opens the call function in.several trade action and this time I.have a function and the only thing which.remains is to fed their trade numbers.success normals and the failed numbers I.don’t want to restrict to limit the.success number so I put and the phone.number so I put the big number and the.stage will prove stops by itself because.the per resources which contains URLs.will end so we don’t need to put the.particular number here we can just leave.the very big number and the thread.number let’s leave it to ten it’s okay.okay now the script is done we need to.test it somehow answer obviously it will.not work in multi-threaded mode when we.use the record mode so I’ll use the Run.button and this time it will run in.multi-threaded multiple this is that.first stage is being performant and it.just parses topic URLs and now the.browser automation studio created as.several brothers ten browsers and each.one is scraping data from the Qt form.you see that it outputs craft text in a.log and we can check what’s going on.inside each broad.you see that old topic that we had on.our first page is being processed and.all blauser brothers become bland and.this one is probably some topics which.has been deleted or which is which.access to mesh is being restricted by.for too long the deans user but it’s.okay because this thread will fail in a.minute with temov temov that it can it.can find the post content and our stage.will finish anyway and we still got.there we still got our results so if we.wait for a minute we will see that our.script will end.yes here it is.so here’s our lock with our posts and I.can improve the script more because I.can create I can create a third stage.which runs in several threads — I can.put a different a list of lists of forum.topics which needs to be parsed.and place it inside the resource that.user can choose but for its tutorial.that’s all thank you for watching

Making forum scraper with several stages with BrowserAutomationStudio

Как? Вы еще не смотрели? Ну это зря...