Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Baidu File System – A distributed file system for real-time applications (github.com/baidu)
239 points by bluebore on Oct 25, 2016 | hide | past | favorite | 162 comments


Disclosure: I'm a Gluster developer.

Looks like a pretty good first attempt at a distributed filesystem. Initial impression is HDFS with a distributed NameNode/Nameserver. The first diagram also shows a Metaserver layer that's not mentioned at all in the more recent of the two design docs but "separate Metaerver from Nameserver" appears (unchecked) in the roadmap. All operations using access methods other than their own SDK seem to get funneled through the NameServer cluster, which will severely limit throughput. Not clear how they do replication, though weakly implied that it's driven from the client (like Gluster) or NameServer rather than the first ChunkServer (like Ceph, HDFS, everything else). No mention of how they handle consistency or repair. Likewise no information about performance or security. Not clear if it's anywhere near POSIX compliant (probably not).

FUSE support is in the diagrams, but not checked off on the roadmap. Slow-node detection and avoidance seemed like one of the most interesting features from the design, but is not checked off either. Other things not even on the roadmap, using Gluster not as a fair comparison but as a handy list of possibilities: multiple replication levels, tiering, erasure coding, NFS/SMB, caching, quota, snapshots.

As I said, looks like a good first attempt. Better than most I've seen, with lots of potential, but as of today it seems rather bare-bones. Many hard problems remain to be solved, and I wish them well.


> Looks like a pretty good first attempt at a distributed filesystem. You are damn right. It is. ~~ 3 years ago, the most widely used DFS in baidu was Peta which is similar to HDFS V2. We have migrate to AFS now.


Looks nice. I know Raft better than most of the other pieces, so that's where I started; I didn't see code for dynamic membership changes nor log truncation. I can understand getting by with a fixed membership, but log truncation seems like a requirement for a production system. Would be interested to hear whether this is planned or whether there is a clever way around it!


Well, there's another project in the same organization named iNexus achieved in log truncation. It uses leveldb as underlying storage and the leveldb is slightly modified to clean the outdated data when compacting. Maybe BFS will do something similar. For the source code, please refer to https://github.com/baidu/ins And I'm sorry for the lack of English documents in this repo. We are working on it.


Thank you - excited to see this! I do think there is a lack of a C++ library for Raft that stands alone (and you have two projects just within baidu that could share code). I'd be excited to help with a standalone project! And I'm sorry for my lack of non-English, but it seems that the variable names are still in english so I can follow the code :-)

(It is a pity that Chrome doesn't automatically translate github pages that contain different languages - not sure why that isn't happening.)


Looking through the code it supports fuse, but the documentation in ENG is sparse. It also looks to underpin Tera: the Baidu distributed DB.

I think a low read/write latency dfs suitable for real time applications would be a game changer. I'm hoping they up the documentation from here and engage the English speaking community.


Thanks for your advice. We are working on translating all the documents :)


PS: if your DFS works within a docker container you'll have a very strong differentiator since the rest don't. You'd also possibly solve the "how to do storage in a container cloud without resorting to NAS or separate clusters" problem.


> if your DFS works within a docker container you'll have a very strong differentiator since the rest don't.

Untrue. Gluster is already deployed that way in many places. Yes, in production and at scale.


Do you mean hackery of this sort:

http://blog.xebia.com/persistence-with-docker-containers-tea...

Or do you know of clean, container only (no plugins or special external tools) solution ?


Oh, sorry, didn't realize we were playing the "move the goalposts" game. If you were to google for "gluster" and "containers" you'd get everything from slick marketing stuff to a presentation at the recent Gluster developer summit in Berlin. I have no idea if any of those would meet your next set of standards but, frankly, meh.


Container hosting with a homogeneous cluster constraint was a real requirement for me that I could not find a solution for amongst existing options but since you're a gluster dev you'd probably know better whether its possible; so happy to stand corrected. Thanks for correcting; and no offence intended.


It is certainly possible. The first user I know of who did this was using Mesos. Nowadays the push is more around doing it with Kubernetes and OpenShift; I know there was at least one presentation on it at Red Hat Summit. I'm a core-infrastructure guy, so that's kind of not my bailiwick, but if there's nothing in Gluster's own documentation about such things there might be something in one of those other communities.


I spent a good bit of time yesterday just throwing all the docs into google translate. Tera looks really interesting, but currently there is no way I can use it unless there is documentation in my native language :/


Hey, don't suppose you know if the Baidu Maps team will be translating their docs anytime soon? (specifically the android API docs)


BFS has a very limited FUSE client.

There is another distributed file system that support full POSIX semantics with well tuned FUSE client, called MooseFS [1].

My ex-employer used that in production for about 8 years, the biggest cluster has more than 2PB.

Disclosure: I'm a MooseFS fan and contributor :)

[1] http://moosefs.org/


Speaking of documentation, I see almost no comments in the code.

It all looks reasonable, but this takes self-documenting to an extreme.


The code is not very self-documenting either. There are >50 line long functions with mixed levels of abstraction, and error handling code is completely mixed with logic. I find such code quite hard to read and lack of comments doesn't help.


Based on the design[1], it has a leader / follower pattern (although you should have multiple leaders with Raft consensus to avoid having a single point of failure), where the leader is called "nameserver" and decides where to put each piece of data and metadata among a set of chunk servers and metadata servers.

That design is very reminiscent of CephFS's cluster monitors, metadata servers, object storage devices.

[1]: https://github.com/baidu/bfs/blob/master/docs/design.md, https://github.com/baidu/bfs/blob/master/docs/BFS_design.md


I thought Raft was always-single-leader? Followers can happily become leaders through elections?


Sorry, ambiguous choice of term. The set of nameservers "lead" the rest of the server cluster. Within the set of nameservers, they make decisions by electing a leader among them.


what is the leader/follower pattern? Something like master/slave approach?


It's the PC version of master/slave. See [0] for the original madness...

[0] https://github.com/antirez/redis/issues/3185


Good fucking god. This is insane. And anyone who opposed the the proposal, even while pointing out the fallacy of the core idea, got downvoted to hell too. This gives me a lot of context for what I saw in the last season of South Park.


Even worse, look at GitHub banning repos for using offensive words: https://news.ycombinator.com/item?id=9966118


antirez handled that pretty well.


Yeah, I'd say so too. IMHO I think it was a mistake to use the docs for political purposes, mostly because doing so lends legitimacy to issue.

Where I think he did well was in de-escalating tension.


Leader/Follower is actually something entirely different, but the linked chinese document talks about a master/client approach.

Sadly, in the past years, due to some political movements, the term "master/slave" has been declared problematic, and GitHub actively warns that projects using such language can and will be excluded from the service.

There have been previous discussions about this on HN.


Wow, that's very interesting. I thought GitHub delegates moderation to the repo owners.

There was actually a huge debate about this on Reddit caused by Swift merging a rename change PR into master. The Swift team was so excited about the change for some reason that they didn't even run tests before the merge...


As I mentioned in other subthreads (sadly I can’t edit the original comment anymore, so I have to duplicate content), there is this very famous example of several repos getting banned, and another getting threatened to be banned, for using the word "retard": https://news.ycombinator.com/item?id=9966118


Do you have a link for that? I don't follow swift development and I'm unsure what they actually changed.

Rename of what?


Changing variable names using master/slave to leader/follower.

Here's my comment from the thread:

https://reddit.com/r/ProgrammerHumor/comments/3veu2t/comment...

The rest of the discussion is a good read too.


Github says that? Can you share a link? I thought moderation is up to users unless someone is actually abusing the service.



Where has Github warned this? I can't find any official documentation about this.


I can’t find the specific case of "master/slave", but you can find a similar case on "retard" here: https://news.ycombinator.com/item?id=9966118


Are these conceptually/semantically different from master/worker?


They need to stop saying "real-time". Real-time does not mean "fast", it means "guaranteed performance". This is nothing of the sort.


Can a distributed storage expert comment in what ways this differs from hadoop?


first of all, impl in C++ (JVM/GC is pain in the ass) - clear arch (only master and dataserver) - very concise config file and easy to deploy - most important, 10k nodes scalability without federation design of namespace


Lack of good documentation, no tests and possibly undefined behaviour in a few places. The code also doesn't look any cleaner than HDFS and uses some weird mix of C (*printf, error codes) and C++ (vectors, smart pointers, RAII etc).


> weird mix of C (*printf, error codes) and C++ (vectors, smart pointers, RAII etc).

Haven't looked at any code, but what you describe is very common usage.


For the distributed FS people out there I've got a complicated question. In my job I need to poll and collect data from many remote sensor devices, log all the output, and process that. Not only do I do this buy MANY of my colleges do this and have a different way to manage this process. Can a distributed file system help with this case?

Is there any file system that would be able to sync what amounts to text/binary data across many hosts and allow me to aggregate the data off the network for more secure storage?

I was thinking about using IPFS for this but this also seems better. I'd hopefully like to have a private network for this use case so that other people can't post up a device on this file system and introduce fake data.


Interesting: Google flags, protocol buffers, and Google C++ style.


If you look at Baidu's infrastructure, it's almost like a parallel universe where the names are identical or almost identical to Google's: BFE, GTC, GSLB. And BFS does look a lot like GFS2 aka Colossus.


Really makes you think...


Seems that most of Baidu's C++ open source projects have this pattern as well, albeit with minor variations on the Google C++ style such as 4 spaces for indentation.


More like a C++ clone of HDFS than most people are likely hoping. While you seem to be able to mount it with FUSE I imagine it's primarily meant to be programmed against directly.

Using Raft over a dependency on an external consensus system is nice. Definitely makes the namenode architecture much better.


It looks like a faster version of HDFS since it's written in C++ (vs Java).

Another important aspect is that is using SSD + SATA(I suppose) , which could be a better option than standard SATA/SSD or LV cache using SATA + SSD.

Even if it's just a new thing, if it proves to be faster it may be implemented in Hadoop ecosystem in the future. HDFS has a lot of features being a mature piece of software but it lacks on the response time.


"It looks like a faster version of HDFS since it's written in C++ (vs Java)."

This is non sequitur. The conclusion does not follow from the premise.


During non-GC periods, probably true. But having a realtime filesystem service that is prone to stop-the-world GC pauses is a showstopper for many applications.

Also, a C++ implementation is likelier to use far less memory than a Java implementation, assuming the skills of both programmers are roughly equal.


The underlying local filesystem on each node is not truly realtime, so a "realtime distributed file system" is already quite a stretch. Also JVM is perfectly fine with pause times below a few tens of ms worst-case (when using properly tuned G1, CMS GC), which is lower than worst-case latency induced by network + I/O.

As for using less memory - you don't allocate buffers for file data on the JVM heap. You allocate them in native memory exactly as you'd do it in C++. Therefore it is possible to create a JVM-based file system that handles petabytes of data with just as little as 100 MB heap, used mostly for small temporary objects.

Also, the code here is using mutexes a lot to synchronize threads and lock out whole objects. Therefore I think these "realtime" claims are quite exaggerated.


You're using the academic version of realtime, not the one that anybody cares about. HDFS's biggest problem is, and has always been, that it's literally impossible to tune it to give anything like reliable performance, mostly because the nameserver is a single point of lag for the entire system. "Worst case network and IO" latency is a huge stretch. Network performance is predictably sub-ms if you're using a network designed for modern distributed computing (A real stretch, I know, since almost all HDFS installations are on old-school core-router-tree infrastructure.) The IO operations are incredibly unpredictable - For a client at a time. Having individual servers that 10-20ms worst-case performance hiccoughs is nowhere near as bad for a system as all of your clients hiccoughing for even 5ms at the same time.


HDFS biggest problem is its SPOF master-slave architecture, not JVM nor GC. With a truly distributed shared nothing system Java Gc would not be a problem, because servers can now run with no major Gc for hours or days. So two servers or clients doing Gc at the same time are very unlikely. And even if some of them do, the pauses from Gc are much more predictable than the pauses from I/O which on a loaded system can take seconds, not milliseconds.

Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.


> Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.

Can you name one?


LMAX, New York Exchange.


Wow, that's neat. Thanks for the pointer!


> As for using less memory - you don't allocate buffers for file data on the JVM heap.

I meant the code size and heap allocations for data structures, not file buffers.

And 100MB is huge compared to many C++ programs. And that's on top of the Java runtime!


Sure and this DFS in C++ memory use is probably huge compared to many hand-crafted assembly or C programs from 1980s. But who cares? 100 MB or even 1GB is really tiny for today's server hardware. And Java runtime itself is a few MB really. What takes most memory in many Java programs (e.g. IDEs) is code and libraries.


Size can lead to a tremendous difference in performance on modern CPUs, particularly if you can take advantage of L2/L3 instruction and data caches. It still matters, even on modern "big memory" systems where gigabytes of installed RAM are the norm.


Technically correct, but filesystems are mostly about I/O. For example this Baidu filesystem copies blocks of data into userland memory and transfers them in RPC messages - any system using proper zero copy approach would easily beat it even if coded in Python or JS. Baidu also seems to use threads, locks and SEDA instead of more efficient (but much harder to code) thread-per-core async architecture. Threadpools and lock based synchronization are terrible for latency.

The fact that something is in C++ doesn't make it automatically efficient. And particularly, if we're talking about milliseconds, not nanoseconds here, in Java or C# you can do just everything what you can do in C++, performance-wise.


What's the difference then between this and MapR, besides a similar CLDB pattern? (No single point of failure)


Is there any benchmark available?


What wonders me the most, is when they change titles. 1 chinese character sometimes matches 1 english word

Eg. lylei changed the title from "cs启动太慢" to "cs start is too slow " ( on https://github.com/baidu/bfs/issues/376 )

lylei changed the title from "其他SDK写策略" to "SDK writing strategies(fan-out write for example)" (on https://github.com/baidu/bfs/issues/243 )


"cs启动太慢" means "cs start-up too slow," where 启动 is start-moving and likely a verb-result construction, a pattern in Chinese that to my knowledge doesn't exist in Germanic languages. The second one is more accurately translated to "Other SDK writing strategies."


Not commenting on the BDFS so much as its really cool to see large Chinese companies contributing to open source, does anyone know of other large projects outside of the main Android forks? Pardon the ignorance.

Also wonder if there will be larger skepticism toward integrating Chinese O/S in regards to potential influence by the government (like the NSA has tried to influence in the past)


This looks extremely promising and good. I work on distributed system, in particular on databases (so one abstraction layer above file systems). This looks like it would make for a really nice storage engine for https://github.com/amark/gun . Also it is nice to see non-English projects! Very exciting work.


How is this better/different than HDFS? Is this simply an example of NIH?


What I really want to know:

"Once your code has passed the code-review and merged, it will be run on thousands of servers"

And the Chinese text below says tens of thousands of servers, which is it? :-)


Considering Baidu's scale, it would be tens of thousands.

There are several other discrepancies in the doc between the Chinese version and the English one. Some technical proofreading is needed.


In this case, I think it's fine. Chinese has a named number for 10000 (wàn/万), so they used that. Since English doesn't, they used 'thousands'. In either case, the idea is that the code would run on a large number of servers.

For instance, Hindi has special names for 100000 (lakh), 10M (crore/karod) etc. so a similar translation to Hindi would use those even if it meant introducing a factor of 10 in the literal interpretation.


Has anyone found a good deep-dive on the architecture in english?


I wonder whey they choose to rewrite raft and didn't use something with etcd or another working raft solution.


You usually don't want to add another dependency you don't control to your system, if you don't have to (in term of time and resources).

It's another point of failure, more infrastructure you have to keep alive...


Wasn't there already a file system named BFS? This might get confusing.


BFS was part of BeOS which is defunct. The creator went on to work at Apple.


Is it not still used by Haiku?


I wonder how it compares to quantcast's qfs, anyone know?


Does it support ipv6?


The underlying socket libraries might in theory, but they're using them poorly. Example from nameserver_main.cc:

    std::string listen_addr = std::string("0.0.0.0") + server_addr.substr(server_addr.rfind(':'));


Seems good! But I have tried to unistall but I cant...


Anybody know how this differs from AFS?


I think AFS is still only replicated for read-only, not for read-write.


No, it's replicated read-write, but according to wikipedia, file locks are only machine-wide, so write collisions are easy to create.


Are you sure? From that same wikipedia:

"AFS volumes can be replicated to read-only cloned copies."


They can be replicated to read-only copies, but AFS also supports multiple machine writes.


How does this compare to IPFS?


Nice work!


awesome


Care to explain the awesomeness you found?


i wanted to write awesome too, so i'll be more detailed :)

Seems (have not tried it yet) awesome because: - another big party offering such software, the more choices the merrier for the users/sysadmins - sandboxed - scalable to 10k nodes - no single point failure - ssd and traditional disk usage via the disk manager


I see many positive comments that are not downvoted, so when you downvote someone saying "awesome", I suspect it is because you disagree, not because it was a low value post, which would be the reason why you would downvote. Also, your response was "explain why"; again, I don't see people usually questioning each acclaimation.


For the record I didn't downvote GP, at the time I asked there were only two comments and I was in the mood for learning as this isn't my area.


No, it's pretty clearly downvoted because it contains no elaboration at all, which with one exception all other non-gray posts do.


Released right on the heels of the IPFS announcement?


"The" IPFS announcement? IPFS itself has been around for quite a while. Could you be more specific to which announcement you're referring, and how that relates to BFS?


Why is there no English documentation ?


Working on it. ReadMe, issues and PRs are just a beginning. We definitely want to involve as many contributors as possible.


Because is written by chinese developers for a chinese company?


They put that code in github, releasing it to non-chinese developers.

Won't it be good if code has at least some documentation in English ? It's not that they don't want, they have some part in English.


GitHub isn't just for English users


The hegemony of English is unquestionable.


Last time I checked English was the 3rd most spoken language on the planet.

- Chinese

- Spanish

- English

You mean in tech?


That's ordering by native speakers. If you count non-natives, English has far more speakers than Spanish (but far less than Mandarin).


Also, English is the most widely spoken too, by a wide wide margin.


In purpose, to give a taste what it is like for non-english speakers to english programmers too /s


Why is there rarely any non-english-language documentation for most codebases nowadays? The world doesn't revolved around english-speaking countries.


Because English is the lingua franca of the software industry, and developers are usually expected to know English, no matter where they're from.


The point of my comment was, that shouldn't necessarily be the case forever, and it might not be reasonable to expect it to be.


Sure, nothing ever stays the same, but I think that for the foreseeable future, we can reasonably expect English to stay the universal language of software development. It's the default foreign language people learn in their home countries for a lot of reasons, so a lot of people who want to get into the field already have at least a basic understanding of English. This aids them in learning and communication with other developers, and it's just too convenient to be displaced any time soon.


Because it is a lot of work. Nginx always had good Russian docs, for reasons you can guess.


Why should there be?


Why is the summary in English?


Hopefully the Chinese will not be as arrogant as you when they take over the leading role on this planet.


I was not arrogant. I am not a native English speaker. I expect most softwares to have a documentation in English.

Do you want to learn 5 different languages just because 5 different kind of people write good software ?

It's easy to blame others.


I just want to note how comments complaining about Mandarin being the dev language, gets downvoted (by sympathetic Europeans);

while at the same time so do those complaining about English's hegemony in India (by furious Indians).

Strange is our world.


Maybe both kinds of comments are being downvoted by those of us who like technical conversations not to be full of people bitching about other people's choice of language. I don't find that strange at all.


Too bad that the documentation is so poor... Having PRs in Chinese is not ideal either.


Do you even know how it is when not-your-own-language is the dominant one for everything in computers?


I know how it is, and I'd take English docs over my native language any day.


Sssshhh don't point out the hypocrisy.


Is it hypocrisy? English is not my native language but I consider it the default CS language. There must be a way for us to share knowledge, and that turned out to be English.


Yes, but it is a problem. Unlike Computer languages, human languages do not only convey pure meanings, i.e. pure descriptions of relations between entities. They embed a full baggage of culture, so even if it is convenient and pragmatic to use English in CS as main language, it is not neutral, it is both an effect and a cause of the Anglo-saxon cultural, economic and military hegenomy over the world.


https://github.com/search?q=baidu&type=Everything&repo=&lang...

I feel like we should just make up random strings when we name things...


Can't speak much for the project, but have to admire them for sticking with Mandarin.

India is atleast a 100-200 years away from something of this kind happening; or more likely never at all.

Indeed, there is not a single research university worth its name that isn't also essentially an export hub of brains to the 5-eyes (& Singapore).

English is crucial for India's system of feudal slavery to work.


I don't think they stick to Mandarin because they want to. English is considered hip and intellectual in China, especially in the first tier cities which I suppose the developers of such modern technology are. But the problem is that English is really, really hard for Chinese native speakers, since grammar, words, culture and pronunciation are so different from all the Chinese languages.

So as a team leader or project manager in China I would probably also stick with Chinese since it is much easier to find really good and not too expensive employees that way. Let them try to use English, support the ambition, but don't enforce it.

And I don't know much about India but from what I heard is that it is more a cultural issue that India lacks behind. Everything is (so I heard) still very traditional and backward focussed. While China as a country spent 20-30 years to become more open for new ideas and approaches. How true is that from other people's perspective here?


There are deep social divisions in India, considering its tortured (and propagandized) history.

http://sankrant.org/2011/03/the-english-class-system-2/

There are systematic faults, which prevent much change, if the current policies are kept up (note: India's literacy rate is ~78 %, since literacy (except in English) brings no great advantage).

http://www.nytimes.com/2015/03/22/opinion/sunday/how-english...

http://www.forbes.com/sites/realspin/2014/11/06/the-problem-...

Imagine China, with only the expensive class of engineers, for instance. Or atleast, one with this class, and another class that was educated in English, but barely knows the language, let alone possessed of any usable skill.

There are now villages, driven by this economics, where rural-children are being taught in English. Considering how bad the Japanese/Chinese are with English, it shouldn't be hard to interpolate how disabling this is when everything is being taught in a foreign language.


>India is atleast a 100-200 years away from something of this kind happening; or more likely never at all.

What do you mean this statement? Do you even know GlusterFS (later became RedHat Storage) developed from India.

First try to understand the context before starting your racist rants.

Disclaimer : I'm Ex-GlusterFS dev.


That India is a linguistic-apartheid state doesn't contradict with the fact that the creme de la creme is quite good (very apparent from the population in the US).


Why would India use Mandarin?

English is the most common script in India, why would we use anything else. Unless you are Hindian trying to impose your minority language on the rest of us.


Wow. Just wow.

- English is hardly the "most common script". Just because the retainers in Delhi impose the colonial apparatus on us, doesn't automagically give it "statistical power" as well.

- Every state has a (poor, uneducated, illiterate) captive linguistic population more than that of Korea; no reason they ought to use Hindi, nor even Nagari (script != language, in case your education didn't tell you that).

- Among the rich, yes, English is most common, and this is really what matters in the end, aint it ?

This is precisely why India will never be able to work in its own language, and also precisely why it is doomed to eternal poverty and continued illiteracy. Probably will remain a hub exporting little other than people, for the next couple centuries.

(See: https://youtu.be/SJx0KFtm9Rw?t=21m56s)

And this is why I admire China. It's not democratic, it has a paranoid regime, but at least they aren't run by hypocrites who'd use a "socialist democracy" as cover for continued colonization.

> trying to impose your minority language on the rest of us.

Your skills in generating irony amuse me.


Note the debate you are having with the GP is in English.

For better or worse, English is the common language for software development (and aviation, etc.)


> Note the debate you are having with the GP is in English.

You don't say ?

> For better or worse, English is the common language for software development (and aviation, etc.)

English in India is more dense than the feudal castes of medieval Europe; hardly a professional thing this.

The Human-rights wallahs don't complain precisely because the current state of affairs benefits the nations that control them; much as it does their native retainers.


https://github.com/baidu/bfs/blob/master/src/client/bfs_clie...

    std::string pad;
    if (path[path.size() - 1] != '/') {
        pad = "/";
    }
Else?


It's an interface that's no longer in use. Would you like to write an issue to us or make a pr to fix it? :)


:) send me your billing details.


You took time to find it and complain, but want to bill to fix? Classy.


The value constructor for strings is the empty string.


I imagined that. I am more like "let me tell you exactly what I want" rather than relying on behavior defined elsewhere. It would not pass review where I work.


Not to be rude, but I'm glad I don't work where you work.

How would that be consistent with your own classes? "Oh no, you can't just use a 'Tree' object, you need to explicitly set that there are no leaves yet, no branches yet, no squirrels yet, etc… etc…"

Do you .clear() your vectors before you use them?

This sounds like newbies that do: #define TRUE (1 == 1)


It's not about initialization, it's more about specifying clearly what happens in all cases.

Anyway the reason for which it would no pass review is that today you use one compiler, tomorrow you have to use another and then you have to review all these little details again. It's about saving money more than anything and you do that by not relying on compiler behavior.


It is specified clearly though. The behavior is not compiler dependent, it's specified in the C++ Language standard. See http://en.cppreference.com/w/cpp/language/default_initializa... and http://www.cplusplus.com/reference/string/string/string/

If a different compiler breaks this behavior, it's not standard compliant and thus could do all sorts of stuff in every possible line, including in:

    std::string pad = "";


Standards change. Don't they?


So how can you use operator= or copy constructor of string?

The '= ""' won't help you.


Do you think that's any more likely than the constructor of std::string changing?


Sure, I would have preferred a non-branching:

std::string pad = descriptive_name_here(path);

with the added bonus of being able to add "const" to that, for the benefit of the reader.

This is not relying on compiler implementation! Can you name one language that has strings that initialise to anything but a valid object containing an empty string?

This is not an obscure side-effect. This is like assuming "std::vector<int> v;" creates an empty vector, not a undefined-state vector container.

(I don't want someone coding C++ as if all objects are references. Coding in one language as if it were another is a well-known antipattern)


In Java the default value for a String (or any object) is null, not an empty string.


Did you not read my whole comment? Please read the whole thing before replying.


Read the whole comment. Still don't see nothing that invalidates my answer to "Can you name one language that has strings that initialise to anything but a valid object containing an empty string?".


You specifically mentioned Java, and I specifically mentioned reference-based languages, with Java being the most obvious example.


Looks like I can't edit, so I'll create another comment.

I ran into this article that puts quite nicely why the problem isn't the "else", but the "if" itself:

https://medium.com/@bartobri/applying-the-linus-tarvolds-goo...

This is what I meant in the other comment by preferring the non-branching.


> the reason for which it would no pass review is that today you use one compiler, tomorrow you have to use another

Good thing then that it's mandated by the language reference, and not up to the compiler to decide. According to C++11, §21.4.2/1, an uninitialized std::string should be an object of class std::basic_string with non-null data and a size of 0.


That define is very clever: it always lands on it's feet! :)


Except there's already a 'true'. When I see this I know that whoever wrote is completely incompetent (as in "does not know programming", not "is stupid").

It's clever, yes. The bad kind of clever that's also misguided.


Sometimes there is a 'true', sometimes there isn't. Sometimes the code is new, sometimes it is legacy code. You are making too many assumptions. When I see this I know that...


The point of defining true to (1==1) is that it's "future proof" in case implicit typecasting to bool works in a world where 0 evaluates to "true".

That would break approximately ALL C code.

You're being ridiculous. You might as well try to protect against the meaning of "if" changing.

I've seen amateur code that tries to protect against "stdio.h" going away and therefore reimplementing everything in it. This is like that.

Believing that the meaning of everything can change means that you cannot use anything you didn't code yourself. You can't trust documented APIs, then that's some sort of programmer NIH nihilist.


But to a C++ programmer, that code tells exactly what is going on... There's no hidden logic, no non-standard or compiler dependent features.


It's a distributed real-time filesystem. I'm guessing optimizations are key?


Probably.


I have not used C++ much, but isn't pad default-constructed to be an empty string here?


what if path is empty?


Then they have a problem. Undefined Behavior.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: