Thread overview
Encapsulating Locked File Appends
Mar 10, 2009
dsimcha
Mar 10, 2009
Sean Kelly
Mar 10, 2009
dsimcha
Mar 10, 2009
Sean Kelly
Mar 10, 2009
BCS
Mar 10, 2009
dsimcha
Mar 10, 2009
Alexander Pánek
March 10, 2009
Is there an easy way to write a function, or a function already written for me, that will allow for a file shared between processes to be appended to safely?  This should be done in a way that will make it impossible for two processes to write to it at the same time, using locking, and should be platform-independent.

Performance does not matter because, in the use case I have, we're only
talking about one update every few minutes.  Simplicity, however, does matter.
 All I'm trying to do is run a simulation thousands of times on a bunch of
different computers sharing an NFS file system and have all of the results end
up in one nice plain text file instead of having each instance write to its
own file and having to keep track of them and piece them together by hand.
March 10, 2009
dsimcha wrote:
> Is there an easy way to write a function, or a function already written for
> me, that will allow for a file shared between processes to be appended to
> safely?  This should be done in a way that will make it impossible for two
> processes to write to it at the same time, using locking, and should be
> platform-independent.
> 
> Performance does not matter because, in the use case I have, we're only
> talking about one update every few minutes.  Simplicity, however, does matter.
>  All I'm trying to do is run a simulation thousands of times on a bunch of
> different computers sharing an NFS file system and have all of the results end
> up in one nice plain text file instead of having each instance write to its
> own file and having to keep track of them and piece them together by hand.

Use a second file to represent the lock, with the contents as the lock owner.  While the file exists, poll.  When it's not there, create it and write the process id into it, then unlink the file when you're done.
March 10, 2009
== Quote from Sean Kelly (sean@invisibleduck.org)'s article
> dsimcha wrote:
> > Is there an easy way to write a function, or a function already written for me, that will allow for a file shared between processes to be appended to safely?  This should be done in a way that will make it impossible for two processes to write to it at the same time, using locking, and should be platform-independent.
> >
> > Performance does not matter because, in the use case I have, we're only
> > talking about one update every few minutes.  Simplicity, however, does matter.
> >  All I'm trying to do is run a simulation thousands of times on a bunch of
> > different computers sharing an NFS file system and have all of the results end
> > up in one nice plain text file instead of having each instance write to its
> > own file and having to keep track of them and piece them together by hand.
> Use a second file to represent the lock, with the contents as the lock owner.  While the file exists, poll.  When it's not there, create it and write the process id into it, then unlink the file when you're done.

Wouldn't you have to somehow atomically poll and create the file?   What if some
other process created the lock file between your call of exists() and write()?
March 10, 2009
dsimcha wrote:
> == Quote from Sean Kelly (sean@invisibleduck.org)'s article
>> dsimcha wrote:
>>> Is there an easy way to write a function, or a function already written for
>>> me, that will allow for a file shared between processes to be appended to
>>> safely?  This should be done in a way that will make it impossible for two
>>> processes to write to it at the same time, using locking, and should be
>>> platform-independent.
>>>
>>> Performance does not matter because, in the use case I have, we're only
>>> talking about one update every few minutes.  Simplicity, however, does matter.
>>>  All I'm trying to do is run a simulation thousands of times on a bunch of
>>> different computers sharing an NFS file system and have all of the results end
>>> up in one nice plain text file instead of having each instance write to its
>>> own file and having to keep track of them and piece them together by hand.
>> Use a second file to represent the lock, with the contents as the lock
>> owner.  While the file exists, poll.  When it's not there, create it and
>> write the process id into it, then unlink the file when you're done.
> 
> Wouldn't you have to somehow atomically poll and create the file?   What if some
> other process created the lock file between your call of exists() and write()?

If you use the "create only" flag when opening the file then it should fail if the file was already created by someone else.  Unless NFS doesn't provide a sufficiently reliable synchronization mechanism for this to work, that is (I really don't know).
March 10, 2009
Sean Kelly wrote:
> dsimcha wrote:
>> == Quote from Sean Kelly (sean@invisibleduck.org)'s article
>>> dsimcha wrote:
>>>> Is there an easy way to write a function, or a function already written for
>>>> me, that will allow for a file shared between processes to be appended to
>>>> safely?  This should be done in a way that will make it impossible for two
>>>> processes to write to it at the same time, using locking, and should be
>>>> platform-independent.
>>>>
>>>> Performance does not matter because, in the use case I have, we're only
>>>> talking about one update every few minutes.  Simplicity, however, does matter.
>>>>  All I'm trying to do is run a simulation thousands of times on a bunch of
>>>> different computers sharing an NFS file system and have all of the results end
>>>> up in one nice plain text file instead of having each instance write to its
>>>> own file and having to keep track of them and piece them together by hand.
>>> Use a second file to represent the lock, with the contents as the lock
>>> owner.  While the file exists, poll.  When it's not there, create it and
>>> write the process id into it, then unlink the file when you're done.
>>
>> Wouldn't you have to somehow atomically poll and create the file?   What if some
>> other process created the lock file between your call of exists() and write()?
> 
> If you use the "create only" flag when opening the file then it should fail if the file was already created by someone else.  Unless NFS doesn't provide a sufficiently reliable synchronization mechanism for this to work, that is (I really don't know).

I've worked a lot with NFS and have the scars and learned the curses to prove it.

NFS is as non-deterministic as it gets when it comes about concurrent writes. There is next to no guarantee. The append problem is an absolute classic on NFS. I tried about five different schemes, all failed under mysterious circumstances. What I do now and suggest you do too is to have each different process create its own file. After all processes have ended, have a master process assemble all small files into one. It's really the only thing I got to work.


Andrei
March 10, 2009
Reply to Andrei,

> NFS is as non-deterministic as it gets when it comes about concurrent
> writes. There is next to no guarantee. The append problem is an
> absolute classic on NFS. I tried about five different schemes, all
> failed under mysterious circumstances. What I do now and suggest you
> do too is to have each different process create its own file. After
> all processes have ended, have a master process assemble all small
> files into one. It's really the only thing I got to work.
> 
> Andrei
> 

IIRC there is a lockd process on most NFS systems. I think it does something for this issue but I don't know what.


March 10, 2009
== Quote from BCS (ao@pathlink.com)'s article
> Reply to Andrei,
> > NFS is as non-deterministic as it gets when it comes about concurrent writes. There is next to no guarantee. The append problem is an absolute classic on NFS. I tried about five different schemes, all failed under mysterious circumstances. What I do now and suggest you do too is to have each different process create its own file. After all processes have ended, have a master process assemble all small files into one. It's really the only thing I got to work.
> >
> > Andrei
> >
> IIRC there is a lockd process on most NFS systems. I think it does something for this issue but I don't know what.

Thanks, but ideally I'd like to do this in a way that it doesn't _have_ to be NFS, since this is a general problem I have and only this instance is on NFS.  Ideally, I'd like to write a generic function called lockedAppend that works on both Linux and Windows and is filesystem agnostic.  As far as I can tell, this is close to impossible, so maybe I'm better off having all my processes write to separate files.  There is no reason why they absolutely _have_ to all write to the same file, it would just be more convenient if they did.
March 10, 2009
dsimcha wrote:
> Is there an easy way to write a function, or a function already written for
> me, that will allow for a file shared between processes to be appended to
> safely?  This should be done in a way that will make it impossible for two
> processes to write to it at the same time, using locking, and should be
> platform-independent.
> 
> Performance does not matter because, in the use case I have, we're only
> talking about one update every few minutes.  Simplicity, however, does matter.
>  All I'm trying to do is run a simulation thousands of times on a bunch of
> different computers sharing an NFS file system and have all of the results end
> up in one nice plain text file instead of having each instance write to its
> own file and having to keep track of them and piece them together by hand.

Ideally you’d use a clustered file system for that, namely GFS (Global File System), OCFS(2) (Oracle, I think) or Lustre (Sun)... but I don’t know what your use-case is so you might not really need it.

0.02€