a wander through ghc’s new io library simon marlow

22
A Wander through GHC’s New IO library Simon Marlow

Upload: arnold-parent

Post on 01-Apr-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Wander through GHC’s New IO library Simon Marlow

A Wander through GHC’s New IO library

Simon Marlow

Page 2: A Wander through GHC’s New IO library Simon Marlow

The 100-mile view

• the API changes:– Unicode• putStr “A légpárnás hajóm tele van angolnákkal” works! (if your editor is set up right…)• locale-encoding by default, except for Handles in binary

mode (openBinaryFile, hSetBinaryMode)• changing the encoding on the fly

hSetEncoding :: Handle -> TextEncoding -> IO ()hGetEncoding :: Handle -> IO (Maybe TextEncoding)

data TextEncodinglatin1, utf8, utf16, utf32, … :: TextEncodingmkTextEncoding :: String -> IO TextEncodinglocaleEncoding :: TextEncoding

Page 3: A Wander through GHC’s New IO library Simon Marlow

The 100-mile view (cont.)• Better newline support– teletypes needed both CR+LF

to start a new line, and we’ve been paying for it ever since.

hSetNewlineMode :: Handle -> NewlineMode -> IO ()

data Newline = LF {- “\n” –} | CRLF {- “\r\n” -}nativeNewline :: Newline

data NewlineMode = NewlineMode { inputNL :: Newline, outputNL :: Newline }

noNewlineTranslation = NewlineMode { inputNL = LF, outputNL = LF }universalNewlineMode = NewlineMode { inputNL = CRLF, outputNL = nativeNewline }nativeNewlineMode = NewlineMode { inputNL = nativeNewline, outputNL = nativeNewline }

Page 4: A Wander through GHC’s New IO library Simon Marlow

The 10-mile view

• Unicode codecs:– built-in codecs for UTF-8, UTF-16(LE,BE), UTF-

32(LE-BE).– Other codecs use iconv on Unix systems– Built-in codecs only on Windows (no code pages)• yet…

– The pieces for building a codec are provided…

Page 5: A Wander through GHC’s New IO library Simon Marlow

The 10-mile view

• Build your own codec: API in GHC.IO.Encoding

data BufferCodec from to state = BufferCodec { encode :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to) close :: IO () getState :: IO state setState :: state -> IO () }

type TextEncoder state = BufferCodec Char Word8 statetype TextDecoder state = BufferCodec Word8 Char state

data TextEncoding = forall dstate estate . TextEncoding { mkTextDecoder :: IO (TextDecoder dstate) mkTextEncoder :: IO (TextEncoder estate) }

Saving and restoring state is important since Handles support buffering, random access,

and changing encodings

Page 6: A Wander through GHC’s New IO library Simon Marlow

The 1-mile view

• Make your own Handles!

– why mkFileHandle, not mkHandle?

mkFileHandle :: (IODevice dev, BufferedIO dev, Typeable dev) => dev -> FilePath -> IOMode -> Maybe TextEncoding -> NewlineMode -> IO Handle

Type class providing I/O device operations: close, seek, getSize, …

Type class providing buffered reading/writing

Typeable, in case we need to take the Handle apart again later

For error messages

ReadMode/WriteMode/…

Page 7: A Wander through GHC’s New IO library Simon Marlow

IODevice-- | I/O operations required for implementing a 'Handle'.class IODevice a where -- | closes the device. Further operations on the device should -- produce exceptions. close :: a -> IO ()

-- | seek to the specified positing in the data. seek :: a -> SeekMode -> Integer -> IO () seek _ _ _ = ioe_unsupportedOperation

-- | return the current position in the data. tell :: a -> IO Integer tell _ = ioe_unsupportedOperation

-- | returns 'True' if the device is a terminal or console. isTerminal :: a -> IO Bool isTerminal _ = return False

… etc …

Default is for the operation to be unsupported

Page 8: A Wander through GHC’s New IO library Simon Marlow

BufferedIOclass BufferedIO dev where newBuffer :: dev -> BufferState -> IO (Buffer Word8)

fillReadBuffer :: dev -> Buffer Word8 -> IO (Int, Buffer Word8) fillReadBuffer0 :: dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8)

emptyWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer :: dev -> Buffer Word8 -> IO (Buffer Word8) flushWriteBuffer0 :: dev -> Buffer Word8 -> IO (Int, Buffer Word8)

Device gets to allocate the buffer. This allows the device to choose the buffer to point directly at the

data in memory, for example.

0-versions are non-blocking, non-0 versions must read or write at least one byte (but may transfer

less than the whole buffer)

Page 9: A Wander through GHC’s New IO library Simon Marlow

RawIO-- | A low-level I/O provider where the data is bytes in memory.class RawIO a where read :: a -> Ptr Word8 -> Int -> IO Int readNonBlocking :: a -> Ptr Word8 -> Int -> IO (Maybe Int) write :: a -> Ptr Word8 -> Int -> IO () writeNonBlocking :: a -> Ptr Word8 -> Int -> IO Int

readBuf :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)

readBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Maybe Int, Buffer Word8)

writeBuf :: RawIO dev => dev -> Buffer Word8 -> IO ()

writeBufNonBlocking :: RawIO dev => dev -> Buffer Word8 -> IO (Int, Buffer Word8)

Page 10: A Wander through GHC’s New IO library Simon Marlow

Example: a memory-mapped Handle

• Random-access read/write doesn’t perform very well with ordinary buffered I/O. – Let’s implement a Handle backed by a memory-

mapped file– We need to

1. define our device type2. make it an instance of IODevice and BufferedIO3. provide a way to create instances

Page 11: A Wander through GHC’s New IO library Simon Marlow

Example: memory-mapped files

1. Define our device typedata MemoryMappedFile = MemoryMappedFile { mmap_fd :: FD, mmap_addr :: !(Ptr Word8), mmap_length :: !Int, mmap_ptr :: !(IORef Int) } deriving Typeable

Ordinary file descriptor, provided by GHC.IO.FD

Address in memory where our file is mapped, and its length

The current file pointer (Handles have a built-in notion of the

“current position” that we have to emulate)

Typeable is one of the requirements for making a Handle

Page 12: A Wander through GHC’s New IO library Simon Marlow

aside: Buffersmodule GHC.IO.Buffer ( Buffer(..), .. ) where

data Buffer e = Buffer {

bufRaw :: !(ForeignPtr e), bufState :: BufferState, -- ReadBuffer | WriteBuffer

bufSize :: !Int, -- in elements, not bytesbufL :: !Int, -- offset of first item in

the bufferbufR :: !Int -- offset of last item + 1

}

Data

bufRaw bufL bufR bufSize

Page 13: A Wander through GHC’s New IO library Simon Marlow

Example: memory-mapped files

2. (a) make it an instance of BufferedIOinstance BufferedIO MemoryMappedFile where newBuffer m state = do fp <- newForeignPtr_ (mmap_addr m) return (emptyBuffer fp (mmap_length m) state)

fillReadBuffer m buf = do p <- readIORef (mmap_ptr m) let l = mmap_length m if (p >= l) then do return (0, buf{ bufL=p, bufR=p }) else do writeIORef (mmap_ptr m) l return (l-p, buf{ bufL=p, bufR=l })

flushWriteBuffer m buf = do writeIORef (mmap_ptr m) (bufR buf) return buf{ bufL = bufR buf }

fillReadBuffer returns the entire file!

flush is a no-op: just remember where to read

from next

Page 14: A Wander through GHC’s New IO library Simon Marlow

Example: memory-mapped files

2. (b) make it an instance of IODeviceinstance IODevice MemoryMappedFile where close = IODevice.close . mmap_fd

seek m mode val = do let sz = mmap_length m ptr <- readIORef (mmap_ptr m) let off = case mode of AbsoluteSeek -> fromIntegral val RelativeSeek -> ptr + fromIntegral val SeekFromEnd -> sz + fromIntegral val when (off < 0 || off >= sz) $ ioe_seekOutOfRange writeIORef (mmap_ptr m) off

tell m = do o <- readIORef (mmap_ptr m); return (fromIntegral o)

getSize = return . fromIntegral . mmap_length

… etc …

Page 15: A Wander through GHC’s New IO library Simon Marlow

Example: memory-mapped files3. provide a way to create instances

mmapFile :: FilePath -> IOMode -> Bool -> IO HandlemmapFile filepath iomode binary = do

(fd,_devtype) <- FD.openFile filepath iomode sz <- IODevice.getSize fd addr <- c_mmap nullPtr (fromIntegral sz) prot flags (FD.fdFD fd) 0 ptr <- newIORef 0

let m = MemoryMappedFile { mmap_fd = fd, mmap_addr = castPtr addr, mmap_length = fromIntegral sz, mmap_ptr = ptr } let (encoding, newline) | binary = (Nothing, noNewlineTranslation) | otherwise = (Just localeEncoding, nativeNewlineMode)

mkFileHandle m filepath iomode encoding newline

Open the file and mmap() it

Call mkFileHandle to build the Handle

Page 16: A Wander through GHC’s New IO library Simon Marlow

Demo…$ ./Setup configureConfiguring mmap-handle-0.0...$ ./Setup buildPreprocessing library mmap-handle-0.0...Building mmap-handle-0.0...[1 of 1] Compiling System.Posix.IO.MMap ( dist/build/System/Posix/IO/MMap.hs, dist/build/System/Posix/IO/MMap.o )Registering mmap-handle-0.0...$ ./Setup register --inplace --userRegistering mmap-handle-0.0...$ ghc-pkg list --user/home/simonmar/.ghc/x86_64-linux-6.11.20090816/package.conf.d: mmap-handle-0.0

Page 17: A Wander through GHC’s New IO library Simon Marlow

Demo…$ cat test.hsimport System.IOimport System.Posix.IO.MMapimport System.Environmentimport Data.Char

main = do [file,test] <- getArgs h <- if test == "mmap" then mmapFile file ReadWriteMode True else openBinaryFile file ReadWriteMode

sequence_ [ do hSeek h SeekFromEnd (-n) c <- hGetChar h hSeek h AbsoluteSeek n hPutChar h c | n <- [ 1..10000] ]

hClose h putStrLn "done"$ ghc test.hs --make[1 of 1] Compiling Main ( test.hs, test.o )Linking test ...

Page 18: A Wander through GHC’s New IO library Simon Marlow

Timings…

$ time ./test /tmp/words filedone0.24s real 0.14s user 0.10s system 99% ./test /tmp/words file$ time ./test /tmp/words mmapdone0.09s real 0.09s user 0.00s system 99% ./test /tmp/words mmap$ time ./test ./words file # ./ is NFS-mounteddone10.44s real 0.20s user 0.52s system 6% ./test tmp file$ time ./test ./words mmap # ./ is NFS-mounteddone0.10s real 0.09s user 0.00s system 93% ./test tmp mmap

Page 19: A Wander through GHC’s New IO library Simon Marlow

More examples

• A Handle that pipes output bytes to a Chan• Handles backed by Win32 HANDLEs• Handle that reads from a Bytestring/text• Handle that reads from text

Page 20: A Wander through GHC’s New IO library Simon Marlow

The -1 mile view

• Inside the IO library– The file-descriptor functionality is cleanly

separated from the implementation of Handles:• GHC.IO.FD implements file descriptors, with

instances of IODevice and BufferedIO• GHC.IO.Handle.FD defines openFile, using FDs

as the underlying device• GHC.IO.Handle has nothing to do with FDs

Page 21: A Wander through GHC’s New IO library Simon Marlow

Implementation of Handle

data Handle__ = forall dev enc_state dec_state . (IODevice dev, BufferedIO dev, Typeable dev) => Handle__ { haDevice :: !dev, haType :: HandleType, -- read/write/append etc. haByteBuffer :: !(IORef (Buffer Word8)), haCharBuffer :: !(IORef (Buffer CharBufElem)), haEncoder :: Maybe (TextEncoder enc_state), haDecoder :: Maybe (TextDecoder dec_state), haCodec :: Maybe TextEncoding, haInputNL :: Newline, haOutputNL :: Newline, .. some other things .. } deriving Typeable

Existential: packs up the IODevice, BufferedIO, Typeable dictionaries, and codec state is

existentially quantified

Two buffers: one for bytes, one for Chars.

Page 22: A Wander through GHC’s New IO library Simon Marlow

Where to go from here

• This is a step in the right direction, but there is still some obvious ugliness– We haven’t changed the external API, only added to it– There should be a binary I/O layer

• hPutBuf working on Handles is wrong: binary Handles should have a different type

• in a sense, BufferedIO is a binary I/O layer: it is efficient, but inconvenient

– FilePath should be an abstract type.• On Windows, FilePath = String, but on Unix, FilePath = [Word8].

– Should we rethink Handles entirely?• OO-style layers: binary IO, buffering, encoding• Separate read Handles from write Handles?

– read/write Handles are a pain