Jump to page: 1 2 3
Thread overview
std.fileformats?
Jan 06, 2020
berni44
Jan 06, 2020
JN
Jan 06, 2020
Laeeth Isharc
Jan 07, 2020
H. S. Teoh
Jan 07, 2020
berni44
Jan 07, 2020
H. S. Teoh
Jan 07, 2020
Adam D. Ruppe
Jan 07, 2020
Laeeth Isharc
Jan 07, 2020
berni44
Jan 08, 2020
Walter Bright
Jan 08, 2020
Jonathan M Davis
Jan 08, 2020
Walter Bright
Jan 10, 2020
berni44
Jan 10, 2020
rikki cattermole
Jan 10, 2020
berni44
Jan 10, 2020
rikki cattermole
Jan 07, 2020
Jonathan M Davis
Jan 07, 2020
berni44
Jan 07, 2020
rikki cattermole
Jan 07, 2020
Jonathan M Davis
January 06, 2020
In Phobos there are several toplevel modules which each are about one special file format. Wouldn't it be better to put them all into std.fileformats (or some similar place)?

std.base64
std.csv
std.json
std.xml
std.zip

If yes, there are some more questions poping up:

a) Should we use the change to replace some of the modules? At least for std.xml I found std-experimental-xml [1], which seems to be thought to replace xml eventually. And there are lot's of packages which address json. I don't know if there is one, which is *the* candidate for replacement.

b) Should we instead remove some of these? Probably std.zip is here the first candidate. (I put some work in it in the last few weeks, but it would be fine for me throwing this away.)

It's on a gut level - just wondering, what you think about this.

[1] https://code.dlang.org/packages/std-experimental-xml

January 06, 2020
On Monday, 6 January 2020 at 19:40:10 UTC, berni44 wrote:
> b) Should we instead remove some of these? Probably std.zip is here the first candidate. (I put some work in it in the last few weeks, but it would be fine for me throwing this away.)
>
> It's on a gut level - just wondering, what you think about this.
>
> [1] https://code.dlang.org/packages/std-experimental-xml

IMO all except base64 could be removed. Putting everything into the standard library made a lot of sense in the times before we got a package manager. Nowadays it might be better to simplify the standard library and just have XML, JSON, ZIP, CSV as "blessed" packages.
January 06, 2020
On Monday, 6 January 2020 at 21:05:37 UTC, JN wrote:
> On Monday, 6 January 2020 at 19:40:10 UTC, berni44 wrote:
>> b) Should we instead remove some of these? Probably std.zip is here the first candidate. (I put some work in it in the last few weeks, but it would be fine for me throwing this away.)
>>
>> It's on a gut level - just wondering, what you think about this.
>>
>> [1] https://code.dlang.org/packages/std-experimental-xml
>
> IMO all except base64 could be removed. Putting everything into the standard library made a lot of sense in the times before we got a package manager. Nowadays it might be better to simplify the standard library and just have XML, JSON, ZIP, CSV as "blessed" packages.

Dependencies will be our doom.  Whereas if you use something in Phobos then you have some confidence it will still build across platforms in a couple of years.

It also seems to make sense to work on making code.dlang.org ultra-reliable before making it necessary to use libraries before doing even basic things.


January 06, 2020
On Mon, Jan 06, 2020 at 11:38:25PM +0000, Laeeth Isharc via Digitalmars-d wrote:
> On Monday, 6 January 2020 at 21:05:37 UTC, JN wrote:
> > On Monday, 6 January 2020 at 19:40:10 UTC, berni44 wrote:
> > > b) Should we instead remove some of these? Probably std.zip is here the first candidate. (I put some work in it in the last few weeks, but it would be fine for me throwing this away.)
> > > 
> > > It's on a gut level - just wondering, what you think about this.
> > > 
> > > [1] https://code.dlang.org/packages/std-experimental-xml
> > 
> > IMO all except base64 could be removed. Putting everything into the standard library made a lot of sense in the times before we got a package manager. Nowadays it might be better to simplify the standard library and just have XML, JSON, ZIP, CSV as "blessed" packages.
> 
> Dependencies will be our doom.  Whereas if you use something in Phobos then you have some confidence it will still build across platforms in a couple of years.
[...]

I agree, recent experiences have led me to the conclusion that dependencies are a liability rather than a benefit.  The only exception is if you copy the code into your source tree, and periodically (manually) update it.  Dependency resolution is NP complete (gives a whole new meaning to "dependency hell"); that should be a big red flag that dependencies are something to be avoided where possible, or at least treated with care, not relished.

Even std.zip has been the source of trouble in the past: users would install dmd, code away happily, then get slammed with linker errors they don't know how to resolve.  Eventually, the solution is to bundle zlib with the dmd distribution packages.  Again, you see, dependencies are a liability, and the solution is to bundle the dependency with your main package so that the user never has to do any dependency resolution.

This is why I've found that Adam's arsd libraries have been the best out of the D libraries out there: his philosophy of minimal (preferably no) dependencies, and everything bundled in a single source file, has been a boon. You just copy the source file into the right place in your source tree, check it in as part of your code repo, and never have to worry about sudden breakage beyond your control.  Every now and then, just to stay up to date, git pull Adam's repo and copy the new file(s) over.

Doing this manually may seem tedious in this day and age of instant gratification, but it's actually a benefit:

(1) Since you manually copy the file(s) over, you're aware of what
dependencies exactly you're pulling in, and (hopefully) are taking
measures to prevent pulling in unnecessary extra cruft;

(2) You will likely have enough sense to make sure your code compiles with the new version of the file before committing to the repo, thus avoiding the all-too-common problem of code breakage due to incompatibility with newer versions of the dependency: if it's checked in, it compiles, 100% guaranteed. Your buildability does not depend on the volatile state of some random server somewhere out there on the Internet.

(3) Your collaborators will never be compiling using different versions of your dependencies and getting different results, which cause a lot of headaches trying to track down problems (everyone's build behaves slightly differently).

(4) Should the upstream authors of the dependency abandon their project, vanish into the ether, or the dependency otherwise becomes unavailable, your code will still compile and still work. Again, you remove the possibility of random breakage caused by random internet outages. Future readers of your code will appreciate that *all* the code is there in the repo, without big missing chunks from dependencies that may no longer exist 10 or 20 years down the road. Even if the code will no longer compile by then, they can at least still see how it works. (And if you take this to the logical conclusion, bundling the exact version of the compiler you used to build the executables seems a logical possibility that will ensure compilability far into the future, even after your dependencies' maintainers have long abandoned it.)


After so many decades of academia and industry alike trumpeting code reuse, I'm starting to become skeptical that perhaps King Code Reuse has invisible clothes. Dependency hell is a smell suggesting that something is fundamentally wrong with the concept.


T

-- 
That's not a bug; that's a feature!
January 06, 2020
On Monday, January 6, 2020 12:40:10 PM MST berni44 via Digitalmars-d wrote:
> In Phobos there are several toplevel modules which each are about one special file format. Wouldn't it be better to put them all into std.fileformats (or some similar place)?
>
> std.base64
> std.csv
> std.json
> std.xml
> std.zip
>
> If yes, there are some more questions poping up:
>
> a) Should we use the change to replace some of the modules? At least for std.xml I found std-experimental-xml [1], which seems to be thought to replace xml eventually. And there are lot's of packages which address json. I don't know if there is one, which is *the* candidate for replacement.
>
> b) Should we instead remove some of these? Probably std.zip is here the first candidate. (I put some work in it in the last few weeks, but it would be fine for me throwing this away.)
>
> It's on a gut level - just wondering, what you think about this.
>
> [1] https://code.dlang.org/packages/std-experimental-xml

Are you talking about putting them all in a single module or all in a single package? Putting them in a single module would be the opposite of the direction that we've been going with Phobos over the past few years. We've generally been breaking up larger modules, not merging modules. So, if we do rearrange these modules, it would definitely not be by putting them all in one module. However, even if we moved these modules into a sub-package, moving them around at this point would arguably just be unnecessary churn. It would break existing code for minimal benefit. If we replace any of them with new implementations (e.g. there's been talk of replacing std.xml and std.json for years now), then maybe they should go in a deeper package hierarchy, but I really don't think that it makes much sense to simply rearrange modules. It's also the sort of thing that Walter tends to be against.

Now, personally, I don't think that anything regarding file formats should have been in the standard library in the first place. IMHO, that's not the sort of thing that belongs in a standard library, and they really should have been on code.dlang.org. However, when they were originally written, we didn't have code.dlang.org, so they ended up in Phobos. Either way, based on previous discussions on adding stuff to Phobos, I think that it's pretty clear that any new, major additions would have to be on code.dlang.org and battle-tested there before being moved into Phobos, and I don't see any of these modules being replaced any time soon even if we want to replace them.

std.experimental.xml was a GSoC project that was not completed and is basically dead. It seems like the original author got too busy with school and never got back to it. For it to get anywhere, someone else would have to finish it. It was started with the intention of replacing std.xml, but like any other major additions, it would still have to go through the Phobs review process and be voted in.

http://code.dlang.org/packages/dxml might end up in Phobos at some point, but that's not a fight that I want to fight right now, and the only reason that I think that it would make sense to put any XML parser/writer in Phobos at this point is because we already have std.xml, and it really needs to either be replaced or removed.

http://code.dlang.org/packages/std_data_json was a candidate for replacing std.json, but IIRC, there was basically too much arguing over what the new std.json should look like, so Sonke gave up on getting it into Phobos. No one has tried to get a new JSON solution through the Phobos review process since then, and I think that for the most part, people have been happy to just put their stuff on code.dlang.org rather than trying to push anything through the Phobos review process.

For the most part, I don't see any point in removing any of these modules, since that would break existing code, and while I don't think that Phobos should have been implementing parsers for standard file formats, that doesn't necessarily mean that we should be breaking existing code to remove them. The primary exception would probably be std.xml, since it arguably does more harm than good, but there's never really been a consensus on ripping it out without a replacement being added to Phobos.

BTW, base64 isn't really a file format. It's an encoding. So, even if we were going to move all of these into a sub-package, std.base64 wouldn't belong with them.

- Jonathan M Davis



January 06, 2020
On 1/6/20 6:38 PM, Laeeth Isharc wrote:
> On Monday, 6 January 2020 at 21:05:37 UTC, JN wrote:
>> On Monday, 6 January 2020 at 19:40:10 UTC, berni44 wrote:
>>> b) Should we instead remove some of these? Probably std.zip is here the first candidate. (I put some work in it in the last few weeks, but it would be fine for me throwing this away.)
>>>
>>> It's on a gut level - just wondering, what you think about this.
>>>
>>> [1] https://code.dlang.org/packages/std-experimental-xml
>>
>> IMO all except base64 could be removed. Putting everything into the standard library made a lot of sense in the times before we got a package manager. Nowadays it might be better to simplify the standard library and just have XML, JSON, ZIP, CSV as "blessed" packages.
> 
> Dependencies will be our doom.  Whereas if you use something in Phobos then you have some confidence it will still build across platforms in a couple of years.
> 
> It also seems to make sense to work on making code.dlang.org ultra-reliable before making it necessary to use libraries before doing even basic things.

Except you do need to depend on it to do basic things. Have you tried using std.xml?

Because something lives on code.dlang.org does not mean it is unreliable. The reliability depends on the developer, not the platform. I imagine vibe.d is going to be super-reliable for years, and it lives on code.dlang.org.

H.S. Teoh also talks about arsd. This is also available on code.dlang.org (though not exactly built for it), but the mechanism of copying license compatible code into your project so you can maintain it doesn't depend on it being outside of code.dlang.org. You can do that with anything! D modules are pretty movable.

One thing that is common between these two projects (and Phobos as well) --  They are both major projects with almost all their dependencies included in the project. That is, the project moves together, so you always have a cohesive "mini-standard" library.

The dependency problem really shows up when your dependencies depend on dependencies that depend on dependencies that are all written and maintained by various people. A great example of how this can be a problem was the story of npm left-pad https://qz.com/646467/how-one-programmer-broke-the-internet-by-deleting-a-tiny-piece-of-code/

There's something to be said about having things outside the standard library not for the reasons of stability but for maneuverability. One can easily add/improve/change projects that are outside the standard library, but poor decisions in the standard library are sometimes stuck there (see std.xml).

For file formats, or parsers, or really any kind of system that has innumerable ways of solving the problem, I really think the standard library is not where they should be. The benefits are cohesion with the library, and usage by the library. But there are many ways to skin the JSON cat. Especially when there's little reason for JSON to be included in Phobos other than a place for it to be.

Oh, and I agree that reliability of code.dlang.org (or having dub more immune to the web site going down) should be a high priority.

-Steve
January 07, 2020
On Tuesday, 7 January 2020 at 01:45:21 UTC, Steven Schveighoffer wrote:
> H.S. Teoh also talks about arsd. This is also available on code.dlang.org (though not exactly built for it), but the mechanism of copying license compatible code into your project so you can maintain it doesn't depend on it being outside of code.dlang.org. You can do that with anything! D modules are pretty movable.

In theory yes, but in practice it can be very difficult because of the size of the dependency graph.

You import foo which needs bar which needs baz which needs joe and sally which need fred, tom, dick, and harry... it is very, very easy to fall down that rabbit hole and package managers make it look enticing.

Heck, even I am very, very tempted to introduce a few base modules to my libs, especially now that I use dmd -i more so it would be almost transparent. And I probably would have already if I didn't type out my modules each time for so long.

So my policy isn't to package dependencies, it is to *eliminate* them. So the individual files mostly stand alone. You can import one without needing the others, not even from the same repo.
January 07, 2020
On Tuesday, 7 January 2020 at 01:45:21 UTC, Steven Schveighoffer wrote:
> On 1/6/20 6:38 PM, Laeeth Isharc wrote:
>> On Monday, 6 January 2020 at 21:05:37 UTC, JN wrote:
>>> On Monday, 6 January 2020 at 19:40:10 UTC, berni44 wrote:
>>>> b) Should we instead remove some of these? Probably std.zip is here the first candidate. (I put some work in it in the last few weeks, but it would be fine for me throwing this away.)
>>>>
>>>> It's on a gut level - just wondering, what you think about this.
>>>>
>>>> [1] https://code.dlang.org/packages/std-experimental-xml
>>>
>>> IMO all except base64 could be removed. Putting everything into the standard library made a lot of sense in the times before we got a package manager. Nowadays it might be better to simplify the standard library and just have XML, JSON, ZIP, CSV as "blessed" packages.
>> 
>> Dependencies will be our doom.  Whereas if you use something in Phobos then you have some confidence it will still build across platforms in a couple of years.
>> 
>> It also seems to make sense to work on making code.dlang.org ultra-reliable before making it necessary to use libraries before doing even basic things.
>
> Except you do need to depend on it to do basic things. Have you tried using std.xml?

Yes, and I didn't persist for long.

I wonder if sometimes there can be a tendency to let the best be the enemy of the considerably better.  I think you made an argument for replacing std.xml rather than XML not being in the standard library.  Though maybe that's a better argument than for getting rid of JSON and CSV.

> Because something lives on code.dlang.org does not mean it is unreliable. The reliability depends on the developer, not the platform.

Yes, I agree, but people are wired different ways and learnt to program in different eras and the cognitive cost of figuring out if a code.dlang.org library is any good is much greater for some people than others.  You don't even know if the project will build with the current release of DMD.  That mostly doesn't bother me personally, but it definitely does bother others.

"The reliability depends on the developer, not the
> platform."

I don't agree with this.  Phobos will probably work on BSD or smartos or whatever.  And quite well-written libraries may require tweaking to build even on 32 bit Windows.

And when you have a reasonable number of dependencies there is always something breaking with new releases so the cost may not be trivial for larger projects.

> I imagine vibe.d is going to be super-reliable for years, and it lives on code.dlang.org.

Maybe.   I am very grateful to Sonke for his contribution but vibe is one of the projects that breaks frequently and you can get into a mess where you need newer versions of the compiler for other reasons and vibe doesn't compile.  Plus dub often can bring in dependencies spuriously, though that's a different problem.

> H.S. Teoh also talks about arsd. This is also available on code.dlang.org (though not exactly built for it), but the mechanism of copying license compatible code into your project so you can maintain it doesn't depend on it being outside of code.dlang.org. You can do that with anything! D modules are pretty movable.

I like arsd and think we should use it more internally.  It's pretty different though from having a module in Phobos, particularly if you want to get people who don't know D to use it, all the more so if they don't consider themselves programmers by trade.  It can be too much to absorb in one go.

> There's something to be said about having things outside the standard library not for the reasons of stability but for maneuverability. One can easily add/improve/change projects that are outside the standard library, but poor decisions in the standard library are sometimes stuck there (see std.xml).

Yes I agree but isn't that an argument for adopting a better one as part of the standard library?

Undead could also be more widely publicised and dub could be educated to ask if you want to fix old code by updating references.


> For file formats, or parsers, or really any kind of system that has innumerable ways of solving the problem, I really think the standard library is not where they should be. The benefits are cohesion with the library, and usage by the library. But there are many ways to skin the JSON cat. Especially when there's little reason for JSON to be included in Phobos other than a place for it to be.

A sensible default doesn't stop you from using something suited to your particular needs.  It also saves time and cognitive effort to know what to expect when reading code.


> Oh, and I agree that reliability of code.dlang.org (or having dub more immune to the web site going down) should be a high priority.

I wonder what we could do about that.  I could support if resources would help.


January 07, 2020
On 1/6/20 10:48 PM, Laeeth Isharc wrote:
> On Tuesday, 7 January 2020 at 01:45:21 UTC, Steven Schveighoffer wrote:
>>
>> Except you do need to depend on it to do basic things. Have you tried using std.xml?
> 
> Yes, and I didn't persist for long.
> 
> I wonder if sometimes there can be a tendency to let the best be the enemy of the considerably better.  I think you made an argument for replacing std.xml rather than XML not being in the standard library.  Though maybe that's a better argument than for getting rid of JSON and CSV.

Yes, for sure you can have a "basic" implementation for something, and then go elsewhere for more fancy implementations. Unit testing is a good example of this, the default works, and runs unit tests, but if you want fancy outputs, you go with one of the 3rd-party tools. The nice thing is that the runtime library allows you to swap out it's dull minimal implementation with your fancy implementation.

xml and other file formats are not like that. First, there isn't a requirement in the library to have xml anywhere -- it's its own project, and simply lives in Phobos, nothing else in Phobos depends on it. Because it is of poor quality, we should replace or remove it. Of course, there are many ways to do xml, and it's hard to agree on the "right" way, which is why it sits there basically untouched for over a decade. It's also hard to make a similar mechanism like the previously mentioned unittests where you can "swap out" the implementation with what you really want.

It fits better as an add-on library. It doesn't have to be part of Phobos to be maintained by a quality team. It's just kind of tacked on, mostly because other language libraries (C#, Java, etc) at the time had an xml package, and we were doing what they did. Note that the bare minimum has been done to it, just to keep it building. I don't think that's desirable, and it's not a good look for the language.

Not only that, but there is a cost to having a poor library being the "default" one. People do not go looking right away for something else, so they waste time on it, and finally go elsewhere anyway (perhaps away from D not even knowing there is something better for it). I'd say we are better off NOT having it in there.

>> Because something lives on code.dlang.org does not mean it is unreliable. The reliability depends on the developer, not the platform.
> 
> Yes, I agree, but people are wired different ways and learnt to program in different eras and the cognitive cost of figuring out if a code.dlang.org library is any good is much greater for some people than others.  You don't even know if the project will build with the current release of DMD.  That mostly doesn't bother me personally, but it definitely does bother others.

I definitely think we should have a repository system that separates "blessed" and maintained projects from all the ones that are added by random people. Maybe like a certified project list, and in order to be on it, the project has to be maintained by the core team (and tested along with the core CI), or there has to be a promise to fix issues within a certain time period or whatnot.

Look at boost for example -- not part of the standard library, but might as well be. It lives in a space where innovation needs to move faster than standards committees.

D isn't in the same boat exactly, but we have much less leeway to make significant changes to Phobos packages than we would to make changes on external ones. It's also difficult to say "this is the one way you should use xml", when nobody is passionate enough about it.

> 
> "The reliability depends on the developer, not the
>> platform."
> 
> I don't agree with this.  Phobos will probably work on BSD or smartos or whatever.  And quite well-written libraries may require tweaking to build even on 32 bit Windows.

This is due to the extensive CI system we have on there. This doesn't stop other projects from doing the same.

What I meant was that code outside of Phobos can be just as reliable.

> And when you have a reasonable number of dependencies there is always something breaking with new releases so the cost may not be trivial for larger projects.

If we have the dependencies all covered, then there is not an issue. In other words, we could maintain a list of projects where all the dependencies are up to date.

> 
>> I imagine vibe.d is going to be super-reliable for years, and it lives on code.dlang.org.
> 
> Maybe.   I am very grateful to Sonke for his contribution but vibe is one of the projects that breaks frequently and you can get into a mess where you need newer versions of the compiler for other reasons and vibe doesn't compile.  Plus dub often can bring in dependencies spuriously, though that's a different problem.

I haven't had this experience. When I've updated for my project, the vibe framework seems to be very good about warning me with deprecations rather than breaking.

>> There's something to be said about having things outside the standard library not for the reasons of stability but for maneuverability. One can easily add/improve/change projects that are outside the standard library, but poor decisions in the standard library are sometimes stuck there (see std.xml).
> 
> Yes I agree but isn't that an argument for adopting a better one as part of the standard library?

But which one? Nobody can agree on what to put in there, so it stays. XML is not easy. JSON is easy (I wrote jsoniopipe in 1 week I think), and it is still in there despite other better systems existing.

> Undead could also be more widely publicised and dub could be educated to ask if you want to fix old code by updating references.

You mean move it to undead? I'm fine with that, and I'm also fine with dub being proactive about projects that depend on it. But that doesn't mean we have to replace it.

>> For file formats, or parsers, or really any kind of system that has innumerable ways of solving the problem, I really think the standard library is not where they should be. The benefits are cohesion with the library, and usage by the library. But there are many ways to skin the JSON cat. Especially when there's little reason for JSON to be included in Phobos other than a place for it to be.
> 
> A sensible default doesn't stop you from using something suited to your particular needs.  It also saves time and cognitive effort to know what to expect when reading code.

The standard library should contain a minimal or best implementation of things that are core to the language. I don't think it needs a default implementation of everything.

At one point we were looking at putting all sorts of things in there (std.database, std.io, std.kitchensink). I just think the advent of a build system that can pull in dependencies mitigates a lot of that need. It brings with it other problems, but if we can mow that grass down so you aren't wading through a forest, maybe it's not so bad.

-Steve
January 07, 2020
On Tuesday, 7 January 2020 at 00:10:08 UTC, H. S. Teoh wrote:
> After so many decades of academia and industry alike trumpeting code reuse, I'm starting to become skeptical that perhaps King Code Reuse has invisible clothes. Dependency hell is a smell suggesting that something is fundamentally wrong with the concept.

Years ago I came to the same conclusion but in a completely different area: I was programming generators for sudoku puzzles and it's variants. This looks like being ideal for OOP, because all those variations are quit similar, but it proved not to be. The puzzles, the program produced, where not really good. And the reason was, that every variation needs it's own optimization for good results and that was almost impossible with all those code sharing. Eventually I rewrote everything with a single program for every variation. That worked much better, although it was code repetition over and over again.
« First   ‹ Prev
1 2 3