4/1/2013
Larry W. Cashdollar
@_larry0
User supplied input isn't sanitized against shell metacharacters and is fed directly to the shell. If the user is tricked into extracting a file with shell characters in the name code can be executed remotely.
https://rubygems.org/gems/karteek-docsplit
./karteek-docsplit-0.5.4/lib/docsplit/text_extractor.rb
59 def extract_from_ocr(pdf, pages)
60 tempdir = Dir.mktmpdir
61 base_path = File.join(@output, @pdf_name)
62 if pages
63 pages.each do |page|
64 tiff = "{tempdir}/{@pdf_name}{page}.tif"
65 file = "{basepath}{page}"
66 run "MAGICKTMPDIR={tempdir} OMP_NUM_THREADS=2 gm convert -despeckle +adjoin #{MEMORY_ARGS} #{OCR_FLAGS} {pdf}[{page - 1}] #{tiff} 2>&1"
67 run "tesseract #{tiff} {file} -l eng 2>&1"
68 clean_text(file + '.txt') if @clean_ocr
69 FileUtils.remove_entry_secure tiff
70 end
71 else
72 tiff = "{tempdir}/{@pdf_name}.tif"
73 run "MAGICK_TMPDIR={tempdir} OMP_NUM_THREADS=2 gm convert -despeckle #{MEMORY_ARGS} #{OCR_FLAGS} #{pdf} #{tiff} 2>&1"
74 run "tesseract #{tiff} #{base_path} -l eng 2>&1"
75 clean_text(base_path + '.txt') if @clean_ocr
76 end
Run is defined as:
94 def run(command)
95 result = `#{command}`
96 raise ExtractionFailed, result if $? != 0
97 result
98 end
//The information contained within this publication is
//supplied "as-is"with no warranties or guarantees of fitness
//of use or otherwise. Bot24, Inc nor Bradley Sean Susser accepts
//responsibility for any damage caused by the use or misuse of
//this information