[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] user space page-oriented, persistent transactional memo
From: |
Thomas Lord |
Subject: |
[Gnu-arch-users] user space page-oriented, persistent transactional memory |
Date: |
Wed, 11 Jan 2006 11:59:53 -0800 |
More on building a user-space file system....
`vudev', in the last message, simulates a raw disk which abstracts
away most details of disk geometry.
Real disks are increasingly likely to have very powerful
controllers, a large chunk of fairly fast non-volatile ram, and of
course a slow but huge capacity raw disk. What should the
controller be doing with its abundant compute capacity and
non-volatile ram?
I think that controllers ought to be doing more
to help implement fast, ACID transactions.
* `vumnd' - page-oriented, ACID transactional memory
The `vumnd' data structure is an array of fixed-size
pages of bytes. A client can transiently map
page-aligned, page-size-increment regions of data.
(`vumnd' can be implemented in bit under 2KLOC (having a
good bitset library helps. So, total size so far is about 2.5KLOC
(vumnd + vudev)))
vumnd divides the address space of pages into two parts:
_ctrl pages_ and _heap pages_:
** ctrl pages
For some constant, K, pages 0..K-1 are "ctrl pages".
In a single write transaction a vumnd client can arbitrarilly modify
the ctrl pages. All of these changes take atomically become visible
to other clients only at the successful completion of the
transaction.
** Heap Pages
All pages K..MAX are "heap pages".
At all times, every heap page is either allocated or free.
Allocated pages can be mapped for read-only purposes. It is
an error to try to map a non-allocated page.
New pages are allocated by specifying their contents. Thus,
heap pages are write-once (until unallocated) and are
written at allocation time.
In a single write transaction a vumnd client can allocate
arbitrarilly many new pages. The newly allocated pages will
atomically become visible to other clients only at the the
successful completion of the transaction.
** Rationale
We schematically conceive of the economic sweetspot for tertiary
storage devices to be:
storage host <---> .... to host system
(on controller
g.p. system)
^ ^
| |
fast low-level
non-volatile <-> controller
RAM |
|
v
raw storage
vumnd "ctrl pages" are regions of the non-volatile ram made
accessible to host system control. The storage host gets
to make sure that ctrl pages behave transactionally.
vumnd "heap pages" are regions of the raw storage. We
simply give up on hard questions like concurrent writes
or writes concurrent with reads. The write-once-at-alloc-time
is as much as we actually need for a file-system and is
certainly a low common denominator of how real disks
will be viewable. The allocator can be implemented
storage-host side but, if not, is easy to implement
host system side. (If storage-host side, device/host
i/o bandwidth is conserved.)
* API
* int vumnd_create (const t_uchar ** const err,
const t_uchar * const uri,
t_vudev_page_addr n_ctrl_pages);
* t_vumnd_connection vumnd_connect (const t_uchar ** const err,
const t_uchar * const uri);
* t_vumnd_connection vumnd_dup (const t_uchar ** const err,
t_vumnd_connection cxn);
* int vumnd_disconnect (const t_uchar ** const err,
t_vumnd_connection cxn);
Create, connect to, duplicate a connection to, or
disconnect from a vumnd-formated vudev virtual disk.
* t_vudev_page_addr vumnd_n_ctrl_pages (const t_uchar ** const err,
t_vumnd_connection cxn);
The number ctrl (transactional) pages (addressed 0..N-1).
* int vumnd_write_lock (const t_uchar ** const err,
t_vumnd_connection const cxn);
* int vumnd_have_write_lock (const t_uchar ** const err,
t_vumnd_connection const cxn);
* int vumnd_write_unlock (const t_uchar ** const err,
t_vumnd_connection const cxn);
* int vumnd_read_lock (const t_uchar ** const err,
t_vumnd_connection const cxn);
* int vumnd_have_read_lock (const t_uchar ** const err,
t_vumnd_connection const cxn);
* int vumnd_read_unlock (const t_uchar ** const err,
t_vumnd_connection const cxn);
Acquire, test or release a write or read lock.
* t_vumnd_chunk vumnd_pre_ctrl (const t_uchar ** const err,
t_vumnd_connection const cxn,
t_vudev_page_addr page,
t_vudev_page_addr n_pages);
* t_vumnd_chunk vumnd_pre_heap (const t_uchar ** const err,
t_vumnd_connection const cxn,
t_vudev_page_addr page,
t_vudev_page_addr n_pages);
Return a chunk of memory for reading the pre-transaction
state of the indicated pages. Heap pages must have been
allocated at the start of the transaction.
* t_vumnd_chunk vumnd_post_ctrl (const t_uchar ** const err,
t_vumnd_connection const cxn,
t_vudev_page_addr page,
t_vudev_page_addr n_pages);
Return a chunk of memory for writing to a ctrl page.
All changes made to a ctrl page will be made atomically
visible to other clients only at the successful end of the
transaction.
* t_vudev_page_addr vumnd_post_alloc (const t_uchar ** const err,
t_vumnd_connection const cxn,
t_uchar * data,
t_vudev_page_addr n_pages);
Allocate heap pages and fill them with `n_pages' copied from
`data'. Return the address of the new pages.
* int vumnd_post_free (const t_uchar ** const err,
t_vumnd_connection const cxn,
t_vudev_page_addr page,
t_vudev_page_addr n_pages);
Free heap pages.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Gnu-arch-users] user space page-oriented, persistent transactional memory,
Thomas Lord <=