flat assembler
Message board for the users of flat assembler.

Index > Heap > How to grab all PDF files?

Author
Thread Post new topic Reply to topic
TmX



Joined: 02 Mar 2006
Posts: 821
Location: Jakarta, Indonesia
TmX
There are lots of LLVM-related papers here.
I was a bit lazy to download all of them one by one, so I tried wget

Quote:

# wget -r --no-parent -A.pdf llvm.org/pubs/
--2011-05-31 16:41:52-- http://llvm.org/pubs/
Resolving llvm.org... 128.174.246.134
Connecting to llvm.org|128.174.246.134|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `llvm.org/pubs/index.html'

[ <=> ] 5,889 1.60K/s in 3.6s

2011-05-31 16:41:56 (1.60 KB/s) - `llvm.org/pubs/index.html' saved [5889]

Removing llvm.org/pubs/index.html since it should be rejected.

FINISHED --2011-05-31 16:41:56--
Downloaded: 1 files, 5.8K in 3.6s (1.60 KB/s)


Obviously, it wasn't succesfull.
Any experienced wget users here?
Post 31 May 2011, 16:47
View user's profile Send private message Reply with quote
xleelz



Joined: 12 Mar 2011
Posts: 86
Location: In Google Code Server... waiting for someone to download me
xleelz
You have to have the exact url of the file and you have to take it one file at a time.
The first I found was: http://llvm.org/pubs/2011-02-FOSDEM-LLVMAndClang.pdf
so it would look like this:

Code:
wget http://llvm.org/pubs/2011-02-FOSDEM-LLVMAndClang.pdf -P /home/<user>/
    


find the other urls for the other files and repeat it with them.

_________________
The person you don't know is the person that could help you the most... or rape you, whichever they prefer.
Post 31 May 2011, 17:03
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
You would need to save the index, and then with some AWK/SED magic grab all links (renaming .html to .pdf, if this trick works on all cases), and then use the output for a bash for loop to call wget on each link.

I wish I could help, but unfortunately I don't know AWK nor SED.
Post 31 May 2011, 17:30
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
mmmh, it seems wget has built-in functionality for this: http://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html
Post 31 May 2011, 17:38
View user's profile Send private message Reply with quote
TmX



Joined: 02 Mar 2006
Posts: 821
Location: Jakarta, Indonesia
TmX
LocoDelAssembly wrote:
mmmh, it seems wget has built-in functionality for this: http://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html


Yes, I'm aware of such thing, that's why I use this:
Quote:
wget -r --no-parent -A.pdf llvm.org/pubs/


BTW, I just take a look at the index.html, and surprisingly, no PDF links there. In fact, they put all the links in the JS file. Clever. I guess I have to do the old-fashioned way...
Post 31 May 2011, 17:51
View user's profile Send private message Reply with quote
ManOfSteel



Joined: 02 Feb 2005
Posts: 1154
ManOfSteel
You're a *nix user, TmX, right?

Code:
% wget http://llvm.org/pubs/pubs.js
% grep 'url: ' pubs.js | awk '{ print $2 }' > new1.txt
% sed "s/.html',/.pdf/g" new1.txt > new2.txt
% sed 's/.html",/.pdf/g' new2.txt > new3.txt
% sed 's/"/http:\/\/llvm.org\/pubs\//g' new3.txt > new4.txt
% sed "s/'/http:\/\/llvm.org\/pubs\//g" new4.txt > new5.txt
% Do some clean up if you need
% wget -i new5.txt
    


It's really ugly and could be scripted but I don't have much time to make it prettier. It'll do the job though.
Post 31 May 2011, 19:22
View user's profile Send private message Reply with quote
LocoDelAssembly
Your code has a bug


Joined: 06 May 2005
Posts: 4633
Location: Argentina
LocoDelAssembly
Silly me, didn't read your console dump carefully enough, sorry.

I tried something with sed using the JS file you pointed, but although I succeeded in making links and passing them to wget, for some reason all of them ended with "%20" (as shown by wget output) and couldn't find a way to get rid of the trailing space (because I still don't know sed).

I turned of the virtual machine, but if you wish to continue I can turn on and copy what I did so far, but I think you will save time by just downloading them by hand (as you probably have done already).

[edit]OK, seems ManOfSteel got it Very Happy. I've used something like this when trying: wget $(cat pubs.js | grep '{url: ' | sed ...)[/edit]
[edit2]I see my grep pattern would have missed some links...[/edit2]
Post 31 May 2011, 19:28
View user's profile Send private message Reply with quote
Tyler



Joined: 19 Nov 2009
Posts: 1216
Location: NC, USA
Tyler
If you use FF, you may like this plugin.
Post 31 May 2011, 19:51
View user's profile Send private message Reply with quote
Enko



Joined: 03 Apr 2007
Posts: 678
Location: Mar del Plata
Enko
http://llvm.org/pubs/2002-06-AutomaticPoolAllocation.pdf
http://llvm.org/pubs/2002-08-08-CASES02-ControlC.pdf
http://llvm.org/pubs/2002-12-LattnerMSThesis.pdf
http://llvm.org/pubs/2003-04-29-DataStructureAnalysisTR.pdf
http://llvm.org/pubs/2003-05-01-GCCSummit2003.pdf
http://llvm.org/pubs/2003-05-05-LCTES03-CodeSafety.pdf
http://llvm.org/pubs/2003-07-18-ShuklaMSThesis.pdf
http://llvm.org/pubs/2003-07-18-StanleyMSThesis.pdf
http://llvm.org/pubs/2003-10-01-LLVA.pdf
http://llvm.org/pubs/2004-01-30-CGO-LLVM.pdf
http://llvm.org/pubs/2004-03-ICDCS-Adaptions.pdf
http://llvm.org/pubs/2004-05-JoshiMSThesis.pdf
http://llvm.org/pubs/2004-09-22-LCPCLLVMTutorial.pdf
http://llvm.org/pubs/2004-Spring-AlexanderssonMSThesis.pdf
http://llvm.org/pubs/2005-02-TECS-SAFECode.pdf
http://llvm.org/pubs/2005-03-14-ACP4IS-AspectsKernel.pdf
http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.pdf
http://llvm.org/pubs/2005-05-21-PLDI-PoolAlloc.pdf
http://llvm.org/pubs/2005-06-12-MSP-PointerComp.pdf
http://llvm.org/pubs/2005-06-17-LattnerMSThesis.pdf
http://llvm.org/pubs/2005-07-IDEAS-PerfEstimation.pdf
http://llvm.org/pubs/2005-07-ZimmermanMSThesis.pdf
http://llvm.org/pubs/2005-08-EUROPAR-PerformanceLibs.pdf
http://llvm.org/pubs/2005-09-25-CASES05-SegmentProtection.pdf
http://llvm.org/pubs/2005-09-PASTE-GreedySuiteMinimization.pdf
http://llvm.org/pubs/2005-10-20-LCPC-RegAlloc.pdf
http://llvm.org/pubs/2005-11-SAFECodeTR.pdf
http://llvm.org/pubs/2005-TR-DSAEvaluation.pdf
http://llvm.org/pubs/2006-01-LabrecqueMSThesis.pdf
http://llvm.org/pubs/2006-04-04-CGO-GraphColoring.pdf
http://llvm.org/pubs/2006-04-25-GelatoLLVMIntro.pdf
http://llvm.org/pubs/2006-05-24-SAFECode-BoundsCheck.pdf
http://llvm.org/pubs/2006-06-07-LewyckyChecker.pdf
http://llvm.org/pubs/2006-06-12-PLDI-SAFECode.pdf
http://llvm.org/pubs/2006-06-15-VEE-VectorLLVA.pdf
http://llvm.org/pubs/2006-06-18-WIOSCA-LLVAOS.pdf
http://llvm.org/pubs/2006-09-SOC-Synthesis.pdf
http://llvm.org/pubs/2006-10-CASES-IncreaseMem.pdf
http://llvm.org/pubs/2006-10-DLS-PyPy.pdf
http://llvm.org/pubs/2006-10-ICNPC-ScalingTaskGraphs.pdf
http://llvm.org/pubs/2006-DSN-DanglingPointers.pdf
http://llvm.org/pubs/2007-01-SaP-Security.pdf
http://llvm.org/pubs/2007-03-12-BossaLLVMIntro.pdf
http://llvm.org/pubs/2007-03-Computer-Trident.pdf
http://llvm.org/pubs/2007-03-SPLAT-Aspects.pdf
http://llvm.org/pubs/2007-04-PraherMSThesis.pdf
http://llvm.org/pubs/2007-04-SCOPES-ChainRulePlacement.pdf
http://llvm.org/pubs/2007-05-31-Switch-Lowering.pdf
http://llvm.org/pubs/2007-06-10-PLDI-DSA.pdf
http://llvm.org/pubs/2007-07-25-LLVM-2.0-and-Beyond.pdf
http://llvm.org/pubs/2007-07-CAV-StructuralAbstraction.pdf
http://llvm.org/pubs/2007-07-SCSC-Simulation.pdf
http://llvm.org/pubs/2007-08-16-TRANSACT-Tanger.pdf
http://llvm.org/pubs/2007-09-ESEC-FSE-DesignOptzn.pdf
http://llvm.org/pubs/2007-10-DLS-RPython.pdf
http://llvm.org/pubs/2007-10-PekkaTTA.pdf
http://llvm.org/pubs/2007-SOSP-SVA.pdf
http://llvm.org/pubs/2008-02-23-TRANSACT-TangerObjBased.pdf
http://llvm.org/pubs/2008-02-ImpedingMalwareAnalysis.pdf
http://llvm.org/pubs/2008-03-ASPLOS-HardErrorPropagation.pdf
http://llvm.org/pubs/2008-03-DATE-TLM_Estimation.pdf
http://llvm.org/pubs/2008-03-SAC-SoftwareFaults.pdf
http://llvm.org/pubs/2008-03-TR-UIDependAnalysis.pdf
http://llvm.org/pubs/2008-05-17-BSDCan-LLVMIntro.pdf
http://llvm.org/pubs/2008-05-CoVaC.pdf
http://llvm.org/pubs/2008-05-ISCE-Calysto.pdf
http://llvm.org/pubs/2008-06-13-SPAA-STMDataPartitioning.pdf
http://llvm.org/pubs/2008-06-CompilingHaskelltoLLVM.pdf
http://llvm.org/pubs/2008-06-DSNPDS-ErrorDerating.pdf
http://llvm.org/pubs/2008-06-LCTES-ISelUsingSSAGraphs.pdf
http://llvm.org/pubs/2008-06-PLDI-PuzzleSolving.pdf
http://llvm.org/pubs/2008-06-Reiter-Thesis.pdf
http://llvm.org/pubs/2008-06-SAW-Parfait.pdf
http://llvm.org/pubs/2008-07-RSSI-CHiMPS.pdf
http://llvm.org/pubs/2008-08-RTCodegen.pdf
http://llvm.org/pubs/2008-08-SPIN-Pancam.pdf
http://llvm.org/pubs/2008-09-ASE-FrameAxioms.pdf
http://llvm.org/pubs/2008-09-LadyVM.pdf
http://llvm.org/pubs/2008-09-Lightspark.pdf
http://llvm.org/pubs/2008-10-04-ACAT-LLVM-Intro.pdf
http://llvm.org/pubs/2008-10-CASES-ExecutionContextOptimization.pdf
http://llvm.org/pubs/2008-10-EMSOFT-Volatiles.pdf
http://llvm.org/pubs/2008-11-ICCAD-MCSim.pdf
http://llvm.org/pubs/2008-11-MICRO-CopyOrDiscard.pdf
http://llvm.org/pubs/2008-11-PASTE-CompilerValidation.pdf
http://llvm.org/pubs/2008-12-OSDI-KLEE.pdf
http://llvm.org/pubs/2008-CGO-DagISel.pdf
http://llvm.org/pubs/2009-01-ASP-DAC-Automatic_Instrumentation.pdf
http://llvm.org/pubs/2009-01-ASP-DAC-MemorySimulation.pdf
http://llvm.org/pubs/2009-01-Pattabiraman-Thesis.pdf
http://llvm.org/pubs/2009-01-PEPM-Parfait.pdf
http://llvm.org/pubs/2009-01-POPL-PointerAnalysis.pdf
http://llvm.org/pubs/2009-01-VMCAI-ScalableMemoryModel.pdf
http://llvm.org/pubs/2009-02-PPoPP-MappingParallelism.pdf
http://llvm.org/pubs/2009-03-ACMSE-Superpage.pdf
http://llvm.org/pubs/2009-03-ASPLOS-DMP.pdf
http://llvm.org/pubs/2009-03-ASPLOS-Recovery.pdf
http://llvm.org/pubs/2009-03-CGO-ESoftCheck.pdf
http://llvm.org/pubs/2009-04-SCOPES-RegisterAllocationDeconstructed.pdf
http://llvm.org/pubs/2009-04-SCOPES-SimulationOfInterruptsWithRollback.pdf
http://llvm.org/pubs/2009-04-TECS-MEMMU.pdf
http://llvm.org/pubs/2009-05-21-Thesis-Barrett-3c.pdf
http://llvm.org/pubs/2009-05-EnsuringCorrectnessOfCompiledCode.pdf
http://llvm.org/pubs/2009-05-IWMSE-COMPASS.pdf
http://llvm.org/pubs/2009-06-27-edwin.pdf
http://llvm.org/pubs/2009-06-DSReplacement.pdf
http://llvm.org/pubs/2009-06-HotDep-SymbolicExec.pdf
http://llvm.org/pubs/2009-06-ISC-DataMiningOnGPUs.pdf
http://llvm.org/pubs/2009-06-JPP-SpeculativeParallel.pdf
http://llvm.org/pubs/2009-06-MansinghkaThesis.pdf
http://llvm.org/pubs/2009-06-PLDI-LibraryBindings.pdf
http://llvm.org/pubs/2009-06-PLDI-Parallelizing.pdf
http://llvm.org/pubs/2009-06-PLDI-SoftBound.pdf
http://llvm.org/pubs/2009-07-ISSTA-BegBunch.pdf
http://llvm.org/pubs/2009-07-Karrenberg-Thesis.pdf
http://llvm.org/pubs/2009-08-12-UsenixSecurity-SafeSVAOS.pdf
http://llvm.org/pubs/2009-08-EUROPAR-Blame.pdf
http://llvm.org/pubs/2009-08-FSE-Altair.pdf
http://llvm.org/pubs/2009-08-ISLPED.pdf
http://llvm.org/pubs/2009-08-RehmeThesis.pdf
http://llvm.org/pubs/2009-08-SAS-IPSSA.pdf
http://llvm.org/pubs/2009-08-Zoltar.pdf
http://llvm.org/pubs/2009-09-SBCCI.pdf
http://llvm.org/pubs/2009-10-CASES-ProgressiveSpill.pdf
http://llvm.org/pubs/2009-10-CODES-MPSoC.pdf
http://llvm.org/pubs/2009-10-CODES-TotalProf.pdf
http://llvm.org/pubs/2009-10-LCPC-DataRestructuring.pdf
http://llvm.org/pubs/2009-10-TereiThesis.pdf
http://llvm.org/pubs/2009-12-MICRO-DDT.pdf
http://llvm.org/pubs/2010-01-Wennborg-Thesis.pdf
http://llvm.org/pubs/2010-02-FPGA-BitLevel.pdf
http://llvm.org/pubs/2010-02-IPSJ-CustomInstruction.pdf
http://llvm.org/pubs/2010-03-ASPLOS-ConservationCores.pdf
http://llvm.org/pubs/2010-03-ASPLOS-Orthrus.pdf
http://llvm.org/pubs/2010-03-ASPLOS-Shoestring.pdf
http://llvm.org/pubs/2010-03-ASPLOS-SpeculativeParallelization.pdf
http://llvm.org/pubs/2010-03-GPGPU-ModelingGPGPU.pdf
http://llvm.org/pubs/2010-03-VEE-VMKit.pdf
http://llvm.org/pubs/2010-04-ASPLOS-DeterministicCompiler.pdf
http://llvm.org/pubs/2010-04-EUROSYS-DresdenTM.pdf
http://llvm.org/pubs/2010-04-EUROSYS-ExecutionSynthesis.pdf
http://llvm.org/pubs/2010-04-EUROSYS-Returnless.pdf
http://llvm.org/pubs/2010-04-EUROSYS-RevNIC.pdf
http://llvm.org/pubs/2010-04-NeustifterProfiling.pdf
http://llvm.org/pubs/2010-05-01-ClangBSD.pdf
http://llvm.org/pubs/2010-05-ICSE-QualityOfService.pdf
http://llvm.org/pubs/2010-05-Oakland-HyperSafe.pdf
http://llvm.org/pubs/2010-06-06-Clang-LLVM.pdf
http://llvm.org/pubs/2010-06-ISCA-Relax.pdf
http://llvm.org/pubs/2010-06-ISMM-CETS.pdf
http://llvm.org/pubs/2010-06-ISMM-SpeculativeParallelization.pdf
http://llvm.org/pubs/2010-07-CAV-LazyAnnot.pdf
http://llvm.org/pubs/2010-08-SBLP-SSI.pdf
http://llvm.org/pubs/2010-09-ESORICS-FixOverflows.pdf
http://llvm.org/pubs/2010-09-HASKELLSYM-LLVM-GHC.pdf
http://llvm.org/pubs/2010-10-HotDep-CrashRecovery.pdf
http://llvm.org/pubs/2010-10-OSDI-BypassingRaces.pdf
http://llvm.org/pubs/2010-10-OSDI-DeterministicMT.pdf
http://llvm.org/pubs/2010-12-Preuss-PathProfiling.pdf
http://llvm.org/pubs/2011-02-FOSDEM-LLVMAndClang.pdf
Post 31 May 2011, 20:37
View user's profile Send private message Reply with quote
Enko



Joined: 03 Apr 2007
Posts: 678
Location: Mar del Plata
Enko
In Windows

Firefox>>FlashGot>>Download them all

select a folder and wait 1 min

comand prompt of the folder:
dir/b *.html >>file.txt


open notepad file.txt and replace all .html to .pdf

past to excel on the column B

columb a past http://llvm.org/pubs/

select all, past on the forum



or more easy, you could used a web crawler tool
Post 31 May 2011, 20:42
View user's profile Send private message Reply with quote
Overflowz



Joined: 03 Sep 2010
Posts: 1046
Overflowz
you can try this one:
Code:
wget -r -nd -nH -A pdf -e robots=off http://llvm.org/pubs    

and you should wait because first it crawls site and when it'll found pdf's then it'll download them.
Post 31 May 2011, 21:23
View user's profile Send private message Reply with quote
TmX



Joined: 02 Mar 2006
Posts: 821
Location: Jakarta, Indonesia
TmX
ManOfSteel wrote:
You're a *nix user, TmX, right?

Code:
% wget http://llvm.org/pubs/pubs.js
% grep 'url: ' pubs.js | awk '{ print $2 }' > new1.txt
% sed "s/.html',/.pdf/g" new1.txt > new2.txt
% sed 's/.html",/.pdf/g' new2.txt > new3.txt
% sed 's/"/http:\/\/llvm.org\/pubs\//g' new3.txt > new4.txt
% sed "s/'/http:\/\/llvm.org\/pubs\//g" new4.txt > new5.txt
% Do some clean up if you need
% wget -i new5.txt
    


It's really ugly and could be scripted but I don't have much time to make it prettier. It'll do the job though.


Yes I use Linux on the spare time, but I'm not familiar yet with sed, awk, and co. Embarassed
Post 01 Jun 2011, 03:46
View user's profile Send private message Reply with quote
Raedwulf



Joined: 13 Jul 2005
Posts: 375
Location: United Kingdom
Raedwulf
Possibly cleaner if i use awk...
Code:
curl -s "http://llvm.org/pubs/pubs.js" | grep "url:" | grep -v "http" | sed "/url:/ {s/[^'\"]*['\"]\([a-zA-Z0-9\._-]*\).*/\1/}" | sed 's!\([a-zA-Z0-9\._-]*\)\.html!http://llvm.org/pubs/\1.pdf!'
    

_________________
Raedwulf
Post 02 Jun 2011, 04:20
View user's profile Send private message MSN Messenger Reply with quote
Display posts from previous:
Post new topic Reply to topic

Jump to:  


< Last Thread | Next Thread >
Forum Rules:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Copyright © 1999-2020, Tomasz Grysztar. Also on YouTube, Twitter.

Website powered by rwasa.