Baidu File System – A distributed file system for real-time applications

notacoward · on Oct 25, 2016

Disclosure: I'm a Gluster developer.

Looks like a pretty good first attempt at a distributed filesystem. Initial impression is HDFS with a distributed NameNode/Nameserver. The first diagram also shows a Metaserver layer that's not mentioned at all in the more recent of the two design docs but "separate Metaerver from Nameserver" appears (unchecked) in the roadmap. All operations using access methods other than their own SDK seem to get funneled through the NameServer cluster, which will severely limit throughput. Not clear how they do replication, though weakly implied that it's driven from the client (like Gluster) or NameServer rather than the first ChunkServer (like Ceph, HDFS, everything else). No mention of how they handle consistency or repair. Likewise no information about performance or security. Not clear if it's anywhere near POSIX compliant (probably not).

FUSE support is in the diagrams, but not checked off on the roadmap. Slow-node detection and avoidance seemed like one of the most interesting features from the design, but is not checked off either. Other things not even on the roadmap, using Gluster not as a fair comparison but as a handy list of possibilities: multiple replication levels, tiering, erasure coding, NFS/SMB, caching, quota, snapshots.

As I said, looks like a good first attempt. Better than most I've seen, with lots of potential, but as of today it seems rather bare-bones. Many hard problems remain to be solved, and I wish them well.

Jacky007 · on Nov 1, 2016

> Looks like a pretty good first attempt at a distributed filesystem. You are damn right. It is. ~~ 3 years ago, the most widely used DFS in baidu was Peta which is similar to HDFS V2. We have migrate to AFS now.

justinsb · on Oct 25, 2016

Looks nice. I know Raft better than most of the other pieces, so that's where I started; I didn't see code for dynamic membership changes nor log truncation. I can understand getting by with a fixed membership, but log truncation seems like a requirement for a production system. Would be interested to hear whether this is planned or whether there is a clever way around it!

imafatboy · on Oct 25, 2016

Well, there's another project in the same organization named iNexus achieved in log truncation. It uses leveldb as underlying storage and the leveldb is slightly modified to clean the outdated data when compacting. Maybe BFS will do something similar. For the source code, please refer to https://github.com/baidu/ins And I'm sorry for the lack of English documents in this repo. We are working on it.

justinsb · on Oct 25, 2016

Thank you - excited to see this! I do think there is a lack of a C++ library for Raft that stands alone (and you have two projects just within baidu that could share code). I'd be excited to help with a standalone project! And I'm sorry for my lack of non-English, but it seems that the variable names are still in english so I can follow the code :-)

(It is a pity that Chrome doesn't automatically translate github pages that contain different languages - not sure why that isn't happening.)

usgroup · on Oct 25, 2016

Looking through the code it supports fuse, but the documentation in ENG is sparse. It also looks to underpin Tera: the Baidu distributed DB.

I think a low read/write latency dfs suitable for real time applications would be a game changer. I'm hoping they up the documentation from here and engage the English speaking community.

lylei · on Oct 25, 2016

Thanks for your advice. We are working on translating all the documents :)

usgroup · on Oct 25, 2016

PS: if your DFS works within a docker container you'll have a very strong differentiator since the rest don't. You'd also possibly solve the "how to do storage in a container cloud without resorting to NAS or separate clusters" problem.

notacoward · on Oct 25, 2016

> if your DFS works within a docker container you'll have a very strong differentiator since the rest don't.

Untrue. Gluster is already deployed that way in many places. Yes, in production and at scale.

usgroup · on Oct 25, 2016

Do you mean hackery of this sort:

http://blog.xebia.com/persistence-with-docker-containers-tea...

Or do you know of clean, container only (no plugins or special external tools) solution ?

notacoward · on Oct 25, 2016

Oh, sorry, didn't realize we were playing the "move the goalposts" game. If you were to google for "gluster" and "containers" you'd get everything from slick marketing stuff to a presentation at the recent Gluster developer summit in Berlin. I have no idea if any of those would meet your next set of standards but, frankly, meh.

usgroup · on Oct 25, 2016

Container hosting with a homogeneous cluster constraint was a real requirement for me that I could not find a solution for amongst existing options but since you're a gluster dev you'd probably know better whether its possible; so happy to stand corrected. Thanks for correcting; and no offence intended.

notacoward · on Oct 25, 2016

It is certainly possible. The first user I know of who did this was using Mesos. Nowadays the push is more around doing it with Kubernetes and OpenShift; I know there was at least one presentation on it at Red Hat Summit. I'm a core-infrastructure guy, so that's kind of not my bailiwick, but if there's nothing in Gluster's own documentation about such things there might be something in one of those other communities.

placeybordeaux · on Oct 25, 2016

I spent a good bit of time yesterday just throwing all the docs into google translate. Tera looks really interesting, but currently there is no way I can use it unless there is documentation in my native language :/

Monotoko · on Oct 25, 2016

Hey, don't suppose you know if the Baidu Maps team will be translating their docs anytime soon? (specifically the android API docs)

daviesliu · on Oct 25, 2016

BFS has a very limited FUSE client.

There is another distributed file system that support full POSIX semantics with well tuned FUSE client, called MooseFS [1].

My ex-employer used that in production for about 8 years, the biggest cluster has more than 2PB.

Disclosure: I'm a MooseFS fan and contributor :)

[1] http://moosefs.org/

bluejekyll · on Oct 25, 2016

Speaking of documentation, I see almost no comments in the code.

It all looks reasonable, but this takes self-documenting to an extreme.

pkolaczk · on Oct 26, 2016

The code is not very self-documenting either. There are >50 line long functions with mixed levels of abstraction, and error handling code is completely mixed with logic. I find such code quite hard to read and lack of comments doesn't help.

espadrine · on Oct 25, 2016

Based on the design[1], it has a leader / follower pattern (although you should have multiple leaders with Raft consensus to avoid having a single point of failure), where the leader is called "nameserver" and decides where to put each piece of data and metadata among a set of chunk servers and metadata servers.

That design is very reminiscent of CephFS's cluster monitors, metadata servers, object storage devices.

[1]: https://github.com/baidu/bfs/blob/master/docs/design.md, https://github.com/baidu/bfs/blob/master/docs/BFS_design.md

WeaselNo7 · on Oct 25, 2016

I thought Raft was always-single-leader? Followers can happily become leaders through elections?

espadrine · on Oct 25, 2016

Sorry, ambiguous choice of term. The set of nameservers "lead" the rest of the server cluster. Within the set of nameservers, they make decisions by electing a leader among them.

ergo14 · on Oct 25, 2016

what is the leader/follower pattern? Something like master/slave approach?

omginternets · on Oct 25, 2016

It's the PC version of master/slave. See [0] for the original madness...

[0] https://github.com/antirez/redis/issues/3185

idealpersona · on Oct 25, 2016

Good fucking god. This is insane. And anyone who opposed the the proposal, even while pointing out the fallacy of the core idea, got downvoted to hell too. This gives me a lot of context for what I saw in the last season of South Park.

kuschku · on Oct 25, 2016

Even worse, look at GitHub banning repos for using offensive words: https://news.ycombinator.com/item?id=9966118

placeybordeaux · on Oct 25, 2016

antirez handled that pretty well.

omginternets · on Oct 26, 2016

Yeah, I'd say so too. IMHO I think it was a mistake to use the docs for political purposes, mostly because doing so lends legitimacy to issue.

Where I think he did well was in de-escalating tension.

kuschku · on Oct 25, 2016

Leader/Follower is actually something entirely different, but the linked chinese document talks about a master/client approach.

Sadly, in the past years, due to some political movements, the term "master/slave" has been declared problematic, and GitHub actively warns that projects using such language can and will be excluded from the service.

There have been previous discussions about this on HN.

Cyph0n · on Oct 25, 2016

Wow, that's very interesting. I thought GitHub delegates moderation to the repo owners.

There was actually a huge debate about this on Reddit caused by Swift merging a rename change PR into master. The Swift team was so excited about the change for some reason that they didn't even run tests before the merge...

kuschku · on Oct 25, 2016

As I mentioned in other subthreads (sadly I can’t edit the original comment anymore, so I have to duplicate content), there is this very famous example of several repos getting banned, and another getting threatened to be banned, for using the word "retard": https://news.ycombinator.com/item?id=9966118

ergo14 · on Oct 25, 2016

Do you have a link for that? I don't follow swift development and I'm unsure what they actually changed.

Rename of what?

Cyph0n · on Oct 25, 2016

Changing variable names using master/slave to leader/follower.

Here's my comment from the thread:

https://reddit.com/r/ProgrammerHumor/comments/3veu2t/comment...

The rest of the discussion is a good read too.

ergo14 · on Oct 25, 2016

Github says that? Can you share a link? I thought moderation is up to users unless someone is actually abusing the service.

kuschku · on Oct 25, 2016

Read this case, for example: https://news.ycombinator.com/item?id=9966118

gcr · on Oct 25, 2016

Where has Github warned this? I can't find any official documentation about this.

kuschku · on Oct 25, 2016

I can’t find the specific case of "master/slave", but you can find a similar case on "retard" here: https://news.ycombinator.com/item?id=9966118

int_handler · on Oct 26, 2016

Are these conceptually/semantically different from master/worker?

revelation · on Oct 25, 2016

They need to stop saying "real-time". Real-time does not mean "fast", it means "guaranteed performance". This is nothing of the sort.

anilgulecha · on Oct 25, 2016

Can a distributed storage expert comment in what ways this differs from hadoop?

00k · on Oct 25, 2016

first of all, impl in C++ (JVM/GC is pain in the ass) - clear arch (only master and dataserver) - very concise config file and easy to deploy - most important, 10k nodes scalability without federation design of namespace

pkolaczk · on Oct 25, 2016

Lack of good documentation, no tests and possibly undefined behaviour in a few places. The code also doesn't look any cleaner than HDFS and uses some weird mix of C (*printf, error codes) and C++ (vectors, smart pointers, RAII etc).

jstimpfle · on Oct 25, 2016

> weird mix of C (*printf, error codes) and C++ (vectors, smart pointers, RAII etc).

Haven't looked at any code, but what you describe is very common usage.

gravypod · on Oct 27, 2016

For the distributed FS people out there I've got a complicated question. In my job I need to poll and collect data from many remote sensor devices, log all the output, and process that. Not only do I do this buy MANY of my colleges do this and have a different way to manage this process. Can a distributed file system help with this case?

Is there any file system that would be able to sync what amounts to text/binary data across many hosts and allow me to aggregate the data off the network for more secure storage?

I was thinking about using IPFS for this but this also seems better. I'd hopefully like to have a private network for this use case so that other people can't post up a device on this file system and introduce fake data.

chubot · on Oct 25, 2016

Interesting: Google flags, protocol buffers, and Google C++ style.

puzzle · on Oct 25, 2016

If you look at Baidu's infrastructure, it's almost like a parallel universe where the names are identical or almost identical to Google's: BFE, GTC, GSLB. And BFS does look a lot like GFS2 aka Colossus.

keketi · on Oct 25, 2016

Really makes you think...

int_handler · on Oct 26, 2016

Seems that most of Baidu's C++ open source projects have this pattern as well, albeit with minor variations on the Google C++ style such as 4 spaces for indentation.

jpgvm · on Oct 25, 2016

More like a C++ clone of HDFS than most people are likely hoping. While you seem to be able to mount it with FUSE I imagine it's primarily meant to be programmed against directly.

Using Raft over a dependency on an external consensus system is nice. Definitely makes the namenode architecture much better.

ciucanu · on Oct 25, 2016

It looks like a faster version of HDFS since it's written in C++ (vs Java).

Another important aspect is that is using SSD + SATA(I suppose) , which could be a better option than standard SATA/SSD or LV cache using SATA + SSD.

Even if it's just a new thing, if it proves to be faster it may be implemented in Hadoop ecosystem in the future. HDFS has a lot of features being a mature piece of software but it lacks on the response time.

pkolaczk · on Oct 25, 2016

"It looks like a faster version of HDFS since it's written in C++ (vs Java)."

This is non sequitur. The conclusion does not follow from the premise.

otterley · on Oct 25, 2016

During non-GC periods, probably true. But having a realtime filesystem service that is prone to stop-the-world GC pauses is a showstopper for many applications.

Also, a C++ implementation is likelier to use far less memory than a Java implementation, assuming the skills of both programmers are roughly equal.

pkolaczk · on Oct 25, 2016

The underlying local filesystem on each node is not truly realtime, so a "realtime distributed file system" is already quite a stretch. Also JVM is perfectly fine with pause times below a few tens of ms worst-case (when using properly tuned G1, CMS GC), which is lower than worst-case latency induced by network + I/O.

As for using less memory - you don't allocate buffers for file data on the JVM heap. You allocate them in native memory exactly as you'd do it in C++. Therefore it is possible to create a JVM-based file system that handles petabytes of data with just as little as 100 MB heap, used mostly for small temporary objects.

Also, the code here is using mutexes a lot to synchronize threads and lock out whole objects. Therefore I think these "realtime" claims are quite exaggerated.

GauntletWizard · on Oct 25, 2016

You're using the academic version of realtime, not the one that anybody cares about. HDFS's biggest problem is, and has always been, that it's literally impossible to tune it to give anything like reliable performance, mostly because the nameserver is a single point of lag for the entire system. "Worst case network and IO" latency is a huge stretch. Network performance is predictably sub-ms if you're using a network designed for modern distributed computing (A real stretch, I know, since almost all HDFS installations are on old-school core-router-tree infrastructure.) The IO operations are incredibly unpredictable - For a client at a time. Having individual servers that 10-20ms worst-case performance hiccoughs is nowhere near as bad for a system as all of your clients hiccoughing for even 5ms at the same time.

pkolaczk · on Oct 25, 2016

HDFS biggest problem is its SPOF master-slave architecture, not JVM nor GC. With a truly distributed shared nothing system Java Gc would not be a problem, because servers can now run with no major Gc for hours or days. So two servers or clients doing Gc at the same time are very unlikely. And even if some of them do, the pauses from Gc are much more predictable than the pauses from I/O which on a loaded system can take seconds, not milliseconds.

Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.

otterley · on Oct 26, 2016

> Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.

Can you name one?

pkolaczk · on Oct 26, 2016

LMAX, New York Exchange.

otterley · on Oct 26, 2016

Wow, that's neat. Thanks for the pointer!

otterley · on Oct 25, 2016

> As for using less memory - you don't allocate buffers for file data on the JVM heap.

I meant the code size and heap allocations for data structures, not file buffers.

And 100MB is huge compared to many C++ programs. And that's on top of the Java runtime!

pkolaczk · on Oct 25, 2016

Sure and this DFS in C++ memory use is probably huge compared to many hand-crafted assembly or C programs from 1980s. But who cares? 100 MB or even 1GB is really tiny for today's server hardware. And Java runtime itself is a few MB really. What takes most memory in many Java programs (e.g. IDEs) is code and libraries.

otterley · on Oct 26, 2016

Size can lead to a tremendous difference in performance on modern CPUs, particularly if you can take advantage of L2/L3 instruction and data caches. It still matters, even on modern "big memory" systems where gigabytes of installed RAM are the norm.

pkolaczk · on Oct 26, 2016

Technically correct, but filesystems are mostly about I/O. For example this Baidu filesystem copies blocks of data into userland memory and transfers them in RPC messages - any system using proper zero copy approach would easily beat it even if coded in Python or JS. Baidu also seems to use threads, locks and SEDA instead of more efficient (but much harder to code) thread-per-core async architecture. Threadpools and lock based synchronization are terrible for latency.

The fact that something is in C++ doesn't make it automatically efficient. And particularly, if we're talking about milliseconds, not nanoseconds here, in Java or C# you can do just everything what you can do in C++, performance-wise.

jprince · on Oct 25, 2016

What's the difference then between this and MapR, besides a similar CLDB pattern? (No single point of failure)

luibelgo · on Oct 25, 2016

Is there any benchmark available?

NicoJuicy · on Oct 25, 2016

What wonders me the most, is when they change titles. 1 chinese character sometimes matches 1 english word

Eg. lylei changed the title from "cs启动太慢" to "cs start is too slow " ( on https://github.com/baidu/bfs/issues/376 )

lylei changed the title from "其他SDK写策略" to "SDK writing strategies(fan-out write for example)" (on https://github.com/baidu/bfs/issues/243 )

toxik · on Oct 25, 2016

"cs启动太慢" means "cs start-up too slow," where 启动 is start-moving and likely a verb-result construction, a pattern in Chinese that to my knowledge doesn't exist in Germanic languages. The second one is more accurately translated to "Other SDK writing strategies."

jeffbax · on Oct 26, 2016

Not commenting on the BDFS so much as its really cool to see large Chinese companies contributing to open source, does anyone know of other large projects outside of the main Android forks? Pardon the ignorance.

Also wonder if there will be larger skepticism toward integrating Chinese O/S in regards to potential influence by the government (like the NSA has tried to influence in the past)

marknadal · on Oct 25, 2016

This looks extremely promising and good. I work on distributed system, in particular on databases (so one abstraction layer above file systems). This looks like it would make for a really nice storage engine for https://github.com/amark/gun . Also it is nice to see non-English projects! Very exciting work.

vonnik · on Oct 25, 2016

How is this better/different than HDFS? Is this simply an example of NIH?

khc · on Oct 25, 2016

What I really want to know:

"Once your code has passed the code-review and merged, it will be run on thousands of servers"

And the Chinese text below says tens of thousands of servers, which is it? :-)

muddyrivers · on Oct 25, 2016

Considering Baidu's scale, it would be tens of thousands.

There are several other discrepancies in the doc between the Chinese version and the English one. Some technical proofreading is needed.

kinkrtyavimoodh · on Oct 25, 2016

In this case, I think it's fine. Chinese has a named number for 10000 (wàn/万), so they used that. Since English doesn't, they used 'thousands'. In either case, the idea is that the code would run on a large number of servers.

For instance, Hindi has special names for 100000 (lakh), 10M (crore/karod) etc. so a similar translation to Hindi would use those even if it meant introducing a factor of 10 in the literal interpretation.

HammadB · on Oct 25, 2016

Has anyone found a good deep-dive on the architecture in english?

merb · on Oct 25, 2016

I wonder whey they choose to rewrite raft and didn't use something with etcd or another working raft solution.

chronid · on Oct 25, 2016

You usually don't want to add another dependency you don't control to your system, if you don't have to (in term of time and resources).

It's another point of failure, more infrastructure you have to keep alive...

andrewclunn · on Oct 25, 2016

Wasn't there already a file system named BFS? This might get confusing.

codezero · on Oct 25, 2016

BFS was part of BeOS which is defunct. The creator went on to work at Apple.

andrewclunn · on Oct 25, 2016

Is it not still used by Haiku?

faizshah · on Oct 25, 2016

I wonder how it compares to quantcast's qfs, anyone know?

sshb · on Oct 25, 2016

Does it support ipv6?

p1mrx · on Oct 25, 2016

The underlying socket libraries might in theory, but they're using them poorly. Example from nameserver_main.cc:

    std::string listen_addr = std::string("0.0.0.0") + server_addr.substr(server_addr.rfind(':'));

andeb · on Oct 25, 2016

Seems good! But I have tried to unistall but I cant...

qwertyuiop924 · on Oct 25, 2016

Anybody know how this differs from AFS?

knorker · on Oct 25, 2016

I think AFS is still only replicated for read-only, not for read-write.

qwertyuiop924 · on Oct 25, 2016

No, it's replicated read-write, but according to wikipedia, file locks are only machine-wide, so write collisions are easy to create.

knorker · on Oct 25, 2016

Are you sure? From that same wikipedia:

"AFS volumes can be replicated to read-only cloned copies."

qwertyuiop924 · on Oct 25, 2016

They can be replicated to read-only copies, but AFS also supports multiple machine writes.

fsiefken · on Oct 25, 2016

How does this compare to IPFS?

haosdent · on Oct 25, 2016

Nice work!

taotaowill · on Oct 25, 2016

awesome

techolic · on Oct 25, 2016

Care to explain the awesomeness you found?

thinkMOAR · on Oct 25, 2016

i wanted to write awesome too, so i'll be more detailed :)

Seems (have not tried it yet) awesome because: - another big party offering such software, the more choices the merrier for the users/sysadmins - sandboxed - scalable to 10k nodes - no single point failure - ssd and traditional disk usage via the disk manager

stomato · on Oct 25, 2016

I see many positive comments that are not downvoted, so when you downvote someone saying "awesome", I suspect it is because you disagree, not because it was a low value post, which would be the reason why you would downvote. Also, your response was "explain why"; again, I don't see people usually questioning each acclaimation.

techolic · on Oct 26, 2016

For the record I didn't downvote GP, at the time I asked there were only two comments and I was in the mood for learning as this isn't my area.

detaro · on Oct 25, 2016

No, it's pretty clearly downvoted because it contains no elaboration at all, which with one exception all other non-gray posts do.

Dowwie · on Oct 25, 2016

Released right on the heels of the IPFS announcement?

daenney · on Oct 25, 2016

"The" IPFS announcement? IPFS itself has been around for quite a while. Could you be more specific to which announcement you're referring, and how that relates to BFS?

asitdhal · on Oct 25, 2016

Why is there no English documentation ?

lylei · on Oct 25, 2016

Working on it. ReadMe, issues and PRs are just a beginning. We definitely want to involve as many contributors as possible.

nowayyeah · on Oct 25, 2016

Because is written by chinese developers for a chinese company?

asitdhal · on Oct 25, 2016

They put that code in github, releasing it to non-chinese developers.

Won't it be good if code has at least some documentation in English ? It's not that they don't want, they have some part in English.

robjan · on Oct 25, 2016

GitHub isn't just for English users

panglott · on Oct 25, 2016

The hegemony of English is unquestionable.

StreamBright · on Oct 25, 2016

Last time I checked English was the 3rd most spoken language on the planet.

- Chinese

- Spanish

- English

You mean in tech?

joelwilliamson · on Oct 25, 2016

That's ordering by native speakers. If you count non-natives, English has far more speakers than Spanish (but far less than Mandarin).

kinkrtyavimoodh · on Oct 25, 2016

Also, English is the most widely spoken too, by a wide wide margin.

coldtea · on Oct 25, 2016

In purpose, to give a taste what it is like for non-english speakers to english programmers too /s

hardwaresofton · on Oct 25, 2016

Why is there rarely any non-english-language documentation for most codebases nowadays? The world doesn't revolved around english-speaking countries.

thegeomaster · on Oct 25, 2016

Because English is the lingua franca of the software industry, and developers are usually expected to know English, no matter where they're from.

hardwaresofton · on Oct 25, 2016

The point of my comment was, that shouldn't necessarily be the case forever, and it might not be reasonable to expect it to be.

thegeomaster · on Oct 25, 2016

Sure, nothing ever stays the same, but I think that for the foreseeable future, we can reasonably expect English to stay the universal language of software development. It's the default foreign language people learn in their home countries for a lot of reasons, so a lot of people who want to get into the field already have at least a basic understanding of English. This aids them in learning and communication with other developers, and it's just too convenient to be displaced any time soon.

justincormack · on Oct 25, 2016

Because it is a lot of work. Nginx always had good Russian docs, for reasons you can guess.

ominous · on Oct 25, 2016

Why should there be?

ricardobeat · on Oct 25, 2016

Why is the summary in English?

LinuxFreedom · on Oct 25, 2016

Hopefully the Chinese will not be as arrogant as you when they take over the leading role on this planet.

asitdhal · on Oct 26, 2016

I was not arrogant. I am not a native English speaker. I expect most softwares to have a documentation in English.

Do you want to learn 5 different languages just because 5 different kind of people write good software ?

It's easy to blame others.

gnipgnip · on Oct 25, 2016

I just want to note how comments complaining about Mandarin being the dev language, gets downvoted (by sympathetic Europeans);

while at the same time so do those complaining about English's hegemony in India (by furious Indians).

Strange is our world.

rwallace · on Oct 25, 2016

Maybe both kinds of comments are being downvoted by those of us who like technical conversations not to be full of people bitching about other people's choice of language. I don't find that strange at all.

jstoja · on Oct 25, 2016

Too bad that the documentation is so poor... Having PRs in Chinese is not ideal either.

kzrdude · on Oct 25, 2016

Do you even know how it is when not-your-own-language is the dominant one for everything in computers?

provemewrong · on Oct 25, 2016

I know how it is, and I'd take English docs over my native language any day.

nowayyeah · on Oct 25, 2016

Sssshhh don't point out the hypocrisy.

ricardobeat · on Oct 25, 2016

Is it hypocrisy? English is not my native language but I consider it the default CS language. There must be a way for us to share knowledge, and that turned out to be English.

gbog · on Oct 25, 2016

Yes, but it is a problem. Unlike Computer languages, human languages do not only convey pure meanings, i.e. pure descriptions of relations between entities. They embed a full baggage of culture, so even if it is convenient and pragmatic to use English in CS as main language, it is not neutral, it is both an effect and a cause of the Anglo-saxon cultural, economic and military hegenomy over the world.

sleepychu · on Oct 25, 2016

https://github.com/search?q=baidu&type=Everything&repo=&lang...

I feel like we should just make up random strings when we name things...

gnipgnip · on Oct 25, 2016

Can't speak much for the project, but have to admire them for sticking with Mandarin.

India is atleast a 100-200 years away from something of this kind happening; or more likely never at all.

Indeed, there is not a single research university worth its name that isn't also essentially an export hub of brains to the 5-eyes (& Singapore).

English is crucial for India's system of feudal slavery to work.

erikb · on Oct 25, 2016

I don't think they stick to Mandarin because they want to. English is considered hip and intellectual in China, especially in the first tier cities which I suppose the developers of such modern technology are. But the problem is that English is really, really hard for Chinese native speakers, since grammar, words, culture and pronunciation are so different from all the Chinese languages.

So as a team leader or project manager in China I would probably also stick with Chinese since it is much easier to find really good and not too expensive employees that way. Let them try to use English, support the ambition, but don't enforce it.

And I don't know much about India but from what I heard is that it is more a cultural issue that India lacks behind. Everything is (so I heard) still very traditional and backward focussed. While China as a country spent 20-30 years to become more open for new ideas and approaches. How true is that from other people's perspective here?

gnipgnip · on Oct 25, 2016

There are deep social divisions in India, considering its tortured (and propagandized) history.

http://sankrant.org/2011/03/the-english-class-system-2/

There are systematic faults, which prevent much change, if the current policies are kept up (note: India's literacy rate is ~78 %, since literacy (except in English) brings no great advantage).

http://www.nytimes.com/2015/03/22/opinion/sunday/how-english...

http://www.forbes.com/sites/realspin/2014/11/06/the-problem-...

Imagine China, with only the expensive class of engineers, for instance. Or atleast, one with this class, and another class that was educated in English, but barely knows the language, let alone possessed of any usable skill.

There are now villages, driven by this economics, where rural-children are being taught in English. Considering how bad the Japanese/Chinese are with English, it shouldn't be hard to interpolate how disabling this is when everything is being taught in a foreign language.

giis · on Oct 25, 2016

>India is atleast a 100-200 years away from something of this kind happening; or more likely never at all.

What do you mean this statement? Do you even know GlusterFS (later became RedHat Storage) developed from India.

First try to understand the context before starting your racist rants.

Disclaimer : I'm Ex-GlusterFS dev.

gnipgnip · on Oct 25, 2016

That India is a linguistic-apartheid state doesn't contradict with the fact that the creme de la creme is quite good (very apparent from the population in the US).

FraaJad · on Oct 25, 2016

Why would India use Mandarin?

English is the most common script in India, why would we use anything else. Unless you are Hindian trying to impose your minority language on the rest of us.

gnipgnip · on Oct 25, 2016

Wow. Just wow.

- English is hardly the "most common script". Just because the retainers in Delhi impose the colonial apparatus on us, doesn't automagically give it "statistical power" as well.

- Every state has a (poor, uneducated, illiterate) captive linguistic population more than that of Korea; no reason they ought to use Hindi, nor even Nagari (script != language, in case your education didn't tell you that).

- Among the rich, yes, English is most common, and this is really what matters in the end, aint it ?

This is precisely why India will never be able to work in its own language, and also precisely why it is doomed to eternal poverty and continued illiteracy. Probably will remain a hub exporting little other than people, for the next couple centuries.

(See: https://youtu.be/SJx0KFtm9Rw?t=21m56s)

And this is why I admire China. It's not democratic, it has a paranoid regime, but at least they aren't run by hypocrites who'd use a "socialist democracy" as cover for continued colonization.

> trying to impose your minority language on the rest of us.

Your skills in generating irony amuse me.

wstrange · on Oct 25, 2016

Note the debate you are having with the GP is in English.

For better or worse, English is the common language for software development (and aviation, etc.)

gnipgnip · on Oct 25, 2016

> Note the debate you are having with the GP is in English.

You don't say ?

> For better or worse, English is the common language for software development (and aviation, etc.)

English in India is more dense than the feudal castes of medieval Europe; hardly a professional thing this.

The Human-rights wallahs don't complain precisely because the current state of affairs benefits the nations that control them; much as it does their native retainers.

GoToRO · on Oct 25, 2016

https://github.com/baidu/bfs/blob/master/src/client/bfs_clie...

    std::string pad;
    if (path[path.size() - 1] != '/') {
        pad = "/";
    }

Else?

yvxiang · on Oct 25, 2016

It's an interface that's no longer in use. Would you like to write an issue to us or make a pr to fix it? :)

GoToRO · on Oct 25, 2016

:) send me your billing details.

kjs3 · on Oct 25, 2016

You took time to find it and complain, but want to bill to fix? Classy.

toxik · on Oct 25, 2016

The value constructor for strings is the empty string.

GoToRO · on Oct 25, 2016

I imagined that. I am more like "let me tell you exactly what I want" rather than relying on behavior defined elsewhere. It would not pass review where I work.

knorker · on Oct 25, 2016

Not to be rude, but I'm glad I don't work where you work.

How would that be consistent with your own classes? "Oh no, you can't just use a 'Tree' object, you need to explicitly set that there are no leaves yet, no branches yet, no squirrels yet, etc… etc…"

Do you .clear() your vectors before you use them?

This sounds like newbies that do: #define TRUE (1 == 1)

GoToRO · on Oct 25, 2016

It's not about initialization, it's more about specifying clearly what happens in all cases.

Anyway the reason for which it would no pass review is that today you use one compiler, tomorrow you have to use another and then you have to review all these little details again. It's about saving money more than anything and you do that by not relying on compiler behavior.

bn-usd-mistake · on Oct 25, 2016

It is specified clearly though. The behavior is not compiler dependent, it's specified in the C++ Language standard. See http://en.cppreference.com/w/cpp/language/default_initializa... and http://www.cplusplus.com/reference/string/string/string/

If a different compiler breaks this behavior, it's not standard compliant and thus could do all sorts of stuff in every possible line, including in:

    std::string pad = "";

GoToRO · on Oct 25, 2016

Standards change. Don't they?

knorker · on Oct 25, 2016

So how can you use operator= or copy constructor of string?

The '= ""' won't help you.

consz · on Oct 25, 2016

Do you think that's any more likely than the constructor of std::string changing?

knorker · on Oct 25, 2016

Sure, I would have preferred a non-branching:

std::string pad = descriptive_name_here(path);

with the added bonus of being able to add "const" to that, for the benefit of the reader.

This is not relying on compiler implementation! Can you name one language that has strings that initialise to anything but a valid object containing an empty string?

This is not an obscure side-effect. This is like assuming "std::vector<int> v;" creates an empty vector, not a undefined-state vector container.

(I don't want someone coding C++ as if all objects are references. Coding in one language as if it were another is a well-known antipattern)

rbadaro · on Oct 25, 2016

In Java the default value for a String (or any object) is null, not an empty string.

knorker · on Oct 25, 2016

Did you not read my whole comment? Please read the whole thing before replying.

rbadaro · on Oct 29, 2016

Read the whole comment. Still don't see nothing that invalidates my answer to "Can you name one language that has strings that initialise to anything but a valid object containing an empty string?".

knorker · on Oct 31, 2016

You specifically mentioned Java, and I specifically mentioned reference-based languages, with Java being the most obvious example.

knorker · on Oct 25, 2016

Looks like I can't edit, so I'll create another comment.

I ran into this article that puts quite nicely why the problem isn't the "else", but the "if" itself:

https://medium.com/@bartobri/applying-the-linus-tarvolds-goo...

This is what I meant in the other comment by preferring the non-branching.

pritambaral · on Oct 25, 2016

> the reason for which it would no pass review is that today you use one compiler, tomorrow you have to use another

Good thing then that it's mandated by the language reference, and not up to the compiler to decide. According to C++11, §21.4.2/1, an uninitialized std::string should be an object of class std::basic_string with non-null data and a size of 0.

GoToRO · on Oct 25, 2016

That define is very clever: it always lands on it's feet! :)

knorker · on Oct 25, 2016

Except there's already a 'true'. When I see this I know that whoever wrote is completely incompetent (as in "does not know programming", not "is stupid").

It's clever, yes. The bad kind of clever that's also misguided.

GoToRO · on Oct 25, 2016

Sometimes there is a 'true', sometimes there isn't. Sometimes the code is new, sometimes it is legacy code. You are making too many assumptions. When I see this I know that...

knorker · on Oct 25, 2016

The point of defining true to (1==1) is that it's "future proof" in case implicit typecasting to bool works in a world where 0 evaluates to "true".

That would break approximately ALL C code.

You're being ridiculous. You might as well try to protect against the meaning of "if" changing.

I've seen amateur code that tries to protect against "stdio.h" going away and therefore reimplementing everything in it. This is like that.

Believing that the meaning of everything can change means that you cannot use anything you didn't code yourself. You can't trust documented APIs, then that's some sort of programmer NIH nihilist.

noselasd · on Oct 25, 2016

But to a C++ programmer, that code tells exactly what is going on... There's no hidden logic, no non-standard or compiler dependent features.

stomato · on Oct 25, 2016

It's a distributed real-time filesystem. I'm guessing optimizations are key?

GoToRO · on Oct 25, 2016

Probably.

yorwba · on Oct 25, 2016

I have not used C++ much, but isn't pad default-constructed to be an empty string here?

snnn · on Oct 25, 2016

what if path is empty？

pkolaczk · on Oct 25, 2016

Then they have a problem. Undefined Behavior.