Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Off topic, but the "Data Handling" landscape has become extremely confusing. By Data Handling, I mean

  - Collection
  - Distribution
  - Processing
  - Storage
  - Querying
Earlier, it was SQL + Software, but now, its a myriad of systems, a myriad of methods, with overlapping functions and a whole slew of libraries that come with related tools.

I do understand that the underlying need for all this is the sheer volume of data that is being generated in the world. But as an experienced programmer, but newbie to the big data section, I find it overwhelming.



> Earlier, it was SQL + Software, but now, its a myriad of systems, a myriad of methods, with overlapping functions and a whole slew of libraries that come with related tools.

Apache Arrow and the steadily growing ecosystem around is out to simplify those myraid ways of doing things and bring about interop between various frameworks (Spark, Trinio, Snowflake etc), libraries (pandas, dplyr, etc), and languages (R, Julia, Python, Rust etc).


You don’t have to use those systems. Fundamental systems are still valid and solid. Don’t be afraid to reinvent a wheel which suits your case perfectly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: