Lua - Working with Large Files



Lua I/O library is very fast in handling files. A large file (size of few Megabytes) can be read easily using io.read() as shown below.

Reading Large file in one shot

We can read content of a file in single statment using following commands.

content = io.read ("*all")

or using following code snippet

content = io.read ("*a")

Reading in one go, is the fastest mechanism in Lua to read a file but sometimes it may cause memory issue as loading such a large file as a whole could hamper the system performance. We can read a file line by line using a decent buffer size (8k) using following syntax −

local lines, rest = f:read(BUFSIZE, "*line")

where−

  • BUFSIZE - is the buffer size of 8K, can be defined with value as 2^13.

  • *line - flag to read line by line

  • lines - lines returned by io.read() method

  • rest - rest of the lines of the document.

Example - Counting Lines/Words/Characters of a file

In following example, we're reading lines, words and characters of a large file using buffers technique.

main.lua

-- Opens a file in read
f = io.open("example.txt","r")

-- define a buffer of 8K
local BUFSIZE = 2^13     

-- set file as default input
io.input(f) 
  
--   
local charCounter, lineCounter, wordCounter = 0, 0, 0
    
while true do
   -- read lines upto 8K buffer size
   local lines, rest = io.read(BUFSIZE, "*line")
   
   -- if no more content
   if not lines then 
      break 
   end
  
   -- if more content is avaiable
   if rest then 
      lines = lines .. rest .. '\n' 
   end
   
   -- count characters 
   charCounter = charCounter + string.len(lines)
   -- count words in buffer
   local _,t = string.gsub(lines, "%S+", "") 
   wordCounter = wordCounter + t
   -- count newlines in buffer
   _,t = string.gsub(lines, "\n", "\n")
   lineCounter = lineCounter + t
end
    
print("Lines ", lineCounter)
print("Words ", wordCounter)
print("Characters ", charCounter)

Output

When the above code is built and executed, it produces the following result −

Lines 	792
Words 	2376
Characters 	20196
Advertisements