Discussion:
[linux-elitists] etcd: A highly-available key value store
Don Marti
2013-08-13 14:30:59 UTC
Permalink
Making the rounds...
http://coreos.com/docs/etcd/

"A highly-available key value store for shared
configuration and service discovery."

Another useful thing you could do with this is
replicate Git. The way Git is designed makes
it easy to replicate (I disclosed this obvious
fact just in case USPTO wants to apply its usual
low standard of what's obvious and what's not:
http://ip.com/IPCOM/000225058 ). Instead of a complex
replication scheme, you only need a highly-available
key value store to keep branch references in sync.
The testAndSet in etcd looks like what you need.

Also, it's in Go. On that subject, bonus link:
http://blog.lusis.org/blog/2013/08/11/go-for-system-administrators/

(Looks tempting, a lot easier to deploy than dealing
with large-scale applications in the usual dynamic
languages.)
--
Don Marti +1-510-332-1587 (mobile)
http://zgp.org/~dmarti/ Alameda, California, USA
***@zgp.org
Greg KH
2013-08-13 17:21:16 UTC
Permalink
Post by Don Marti
Making the rounds...
http://coreos.com/docs/etcd/
"A highly-available key value store for shared
configuration and service discovery."
Another useful thing you could do with this is
replicate Git. The way Git is designed makes
it easy to replicate (I disclosed this obvious
fact just in case USPTO wants to apply its usual
http://ip.com/IPCOM/000225058 ). Instead of a complex
replication scheme, you only need a highly-available
key value store to keep branch references in sync.
The testAndSet in etcd looks like what you need.
So, I asked Xiang Li, the main developer of etcd, about this yesterday
on irc, as I saw you had posted this idea elsewhere. He said it really
wouldn't work, as the raft protocol is only for "small" amounts of data
per key (100kb). This is due to the requirements of syncing the servers
together in a specific amount of time (in ms).

He did mention that it could possibly be extended to handle larger
amounts of data, if the sync time is relaxed, if someone wants to work
on making those changes to it.

OB Disclosure, I've known the CoreOS engineers/owners for 9+ years, and
am an advisor to their company.
Post by Don Marti
http://blog.lusis.org/blog/2013/08/11/go-for-system-administrators/
I've played around a bit in Go, and am very impressed. There are some
issues with libraries that seem easy for developers to mess up, but as
long as you don't rely on random github repos for a shipping product
(i.e. not copying them locally and using specific tags), you should be
fine.

greg k-h
Greg KH
2013-08-13 19:03:05 UTC
Permalink
On Tue, Aug 13, 2013 at 01:41:33PM -0500, Jim Thompson wrote:

<in html and cc: me, both of which the linux-elitists list rejects...>
Post by Greg KH
He did mention that it could possibly be extended to handle larger
amounts of data, if the sync time is relaxed, if someone wants to work
on making those changes to it.
Is this just an issue of bandwidth between the CoreOS instances? Is he
assuming 1Gbps links?
No, this is a raft protocol requirement from what I can tell, it has
nothing to do with CoreOS. I think the raft paper is at:
https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
but that seems to be a draft, there's probably a published paper
somewhere that describes more completely.

Although, in skimming this, it might just be a limitation that the
go-raft implementation that is in etcd, as I couldn't find any size
limitations in the raft paper, but I could have missed them.

greg k-h
Andrew Cowie
2013-08-14 00:58:13 UTC
Permalink
Post by Greg KH
Although, in skimming this, it might just be a limitation that the
go-raft implementation that is in etcd, as I couldn't find any size
limitations in the raft paper, but I could have missed them.
There aren't limitations in the raft protocol, as such (modulo time for
clients to actually transmit information, but you'd think that volume is
going to be << available bandwidth over time), but you do have to pay
attention to the log compaction issue; it's non-trivial to know when a
consensus node can safely discard an entry in its log. So log size, over
time, might be a limitation for you. There's an interesting PDF hanging
off this page (which I found out because it was basically extracted from
the original paper):
https://ramcloud.stanford.edu/wiki/display/logcabin/Compaction

There are a quite a few people working on raft implementations:
https://ramcloud.stanford.edu/wiki/display/logcabin/LogCabin

AfC
Sydney
Don Marti
2013-08-13 21:58:02 UTC
Permalink
Post by Greg KH
Post by Don Marti
Making the rounds...
http://coreos.com/docs/etcd/
"A highly-available key value store for shared
configuration and service discovery."
Another useful thing you could do with this is
replicate Git. The way Git is designed makes
it easy to replicate (I disclosed this obvious
fact just in case USPTO wants to apply its usual
http://ip.com/IPCOM/000225058 ). Instead of a complex
replication scheme, you only need a highly-available
key value store to keep branch references in sync.
The testAndSet in etcd looks like what you need.
So, I asked Xiang Li, the main developer of etcd, about this yesterday
on irc, as I saw you had posted this idea elsewhere. He said it really
wouldn't work, as the raft protocol is only for "small" amounts of data
per key (100kb). This is due to the requirements of syncing the servers
together in a specific amount of time (in ms).
You don't need to maintain the Git objects in
etcd--just refs. You can move objects around with a
DHT or git-send-pack. The only time you would need to
test and set a value in Raft is from an update hook.

The update hook would check if the value stored in the
ref being pushed to on that repository is the same as
the value in Raft, and attempt to update the value in
Raft before updating the ref on the repo and accepting
the push. If this succeeds, then the slower task
of transferring the objects and updating the refs on
the other replicas to match what's in Raft can start.

Every other replicated copy is going to refuse pushes
to that branch until the objects reach it and its
copy of the ref gets updated to match Raft, but Git
users can already deal with it when a push fails.
Post by Greg KH
Post by Don Marti
http://blog.lusis.org/blog/2013/08/11/go-for-system-administrators/
I've played around a bit in Go, and am very impressed. There are some
issues with libraries that seem easy for developers to mess up, but as
long as you don't rely on random github repos for a shipping product
(i.e. not copying them locally and using specific tags), you should be
fine.
Tagged releases and locavore build and test servers
-- already making sure to do this.
--
Don Marti +1-510-332-1587 (mobile)
http://zgp.org/~dmarti/ Alameda, California, USA
***@zgp.org
Don Marti
2013-10-29 13:20:29 UTC
Permalink
Post by Don Marti
Post by Greg KH
Post by Don Marti
Making the rounds...
http://coreos.com/docs/etcd/
"A highly-available key value store for shared
configuration and service discovery."
Another useful thing you could do with this is
replicate Git.
So, I asked Xiang Li, the main developer of etcd, about this yesterday
on irc, as I saw you had posted this idea elsewhere. He said it really
wouldn't work, as the raft protocol is only for "small" amounts of data
per key (100kb). This is due to the requirements of syncing the servers
together in a specific amount of time (in ms).
You don't need to maintain the Git objects in
etcd--just refs. You can move objects around with a
DHT or git-send-pack. The only time you would need to
test and set a value in Raft is from an update hook.
First whack at an implementation:
https://github.com/dmarti/piehole

The general idea is that I'll have copies of Git
repositories on multiple VPSs and in-house--push to
one, sync with all the rest. Putting it up to see
what I've missed -- as far as I can tell, connecting
the dificult Computer Science in etcd to the difficult
Computer Science in git, using a simple hook, gives
you a high-availability versioning system.
--
Don Marti +1-510-332-1587 (mobile)
http://zgp.org/~dmarti/ Alameda, California, USA
***@zgp.org
a***@bavariati.org
2013-08-13 17:54:30 UTC
Permalink
Post by Don Marti
Making the rounds...
http://coreos.com/docs/etcd/
"A highly-available key value store for shared
configuration and service discovery."
Hm. Clustered KV store with locking, for people who don't feel like doing
it on Erlang OTP.
I need to learn why that would be useful.
Post by Don Marti
http://blog.lusis.org/blog/2013/08/11/go-for-system-administrators/
Thanks for that.
Just had a 2MB Go binary thrown over the wall to me. Knocked together a
rough equivalent in 50 lines of Python that ran twice as fast. I think
I've worked out why (the Go program, as written, fopens/fcloses a file for
every line processed, which permits parallelism but eats CPU time for
lunch) but still, it left a bad impression.

Just a simple unfrozen caveman sysadmin,
Aaron
Greg KH
2013-08-13 18:12:07 UTC
Permalink
Post by a***@bavariati.org
Just had a 2MB Go binary thrown over the wall to me. Knocked together a
rough equivalent in 50 lines of Python that ran twice as fast. I think
I've worked out why (the Go program, as written, fopens/fcloses a file for
every line processed, which permits parallelism but eats CPU time for
lunch) but still, it left a bad impression.
I had a python program here that someone wrote for me that did much the
same thing (exponential time increase for the size of the file it was
reading.) Rewrote it in C with a 1000% speedup. I should port it to go
one of these days just to get some more practice...

In other words, don't blame the language for the stupid things that the
program's author does :)

greg k-h
Aaron Burt
2013-08-14 05:49:05 UTC
Permalink
Post by Greg KH
I had a python program here that someone wrote for me that did much the
same thing (exponential time increase for the size of the file it was
reading.) Rewrote it in C with a 1000% speedup. I should port it to go
one of these days just to get some more practice...
Hm, did it start with "for line in file.readlines():"? I still find those
in my early .py utils and weep for the wasted runtime before excising the
".readlines()" bit.

But yeah, time to recode in Go for fun, neural plasticity and parallelism.
Best make sure I have the disk space for all that static binary goodness.
Post by Greg KH
In other words, don't blame the language for the stupid things that the
program's author does :)
Oh, I save that kind of prejudice for Ruby, which richly deserves it.
Loading...