April 02, 2003

Bill Cox wrote:
> > There are surely better ways to advertise you project.
> ...
> I'm not trying to advertise DataDraw.  In fact, I'd love to see D
> incorporate features that would allow me to kill it.  I'd prefer that
> user's didn't start adopting DataDraw, as I don't have the time to do
> free support.

Ok, I think it's good to have this said.

> It's open-source, as the copyright file describes. The documentation sucks, and I think it will probably stay that way.

That means its dead outside of the heads of its few experts and will remain so.

> ...
> It's specific insights I've gained in working with DataDraw that I've
> been trying to describe in this group, rather than trying to promote
> DataDraw.
> ...

I'm very interested in your experiences and insights. I'm doing software projects since 1979 and feel very strong about the way systems present themselves towards the programmer (APIs).

> Through using DataDraw for many years, however, I think I've had some fairly unique insights into language design.  Adding features to a target langauge is what DataDraw is for, and I've been able to try out several features not found in C++ in a real industrial coding environment.  Some of those features I've described in other posts.

I'll try to reread some of your postings and arguments. Can you give me some hints to find my way?

> As I said, I was hoping D could be extended to make DataDraw obsolete. That turns out not to be the case.  I'll describe some of my current thinking about this matter below.
> 
> DataDraw currently just models data structures, and allows me to write code generators.  This is much like the old OM tool for UML (which DataDraw preceeds).  It gives me the power of compile-time reflection classes, like those in OpenC++.  However, for each new language, or coding style, I have to write a new code generator, and these things get really complex.  DataDraw currenly has 5.  That kind of sucks.
> 
> Instead, DataDraw should allow me to write one awesome code generator that targets in an intermediate language.  Then, it should allow me to write simple translators for each target language and coding style.  The bulk of the work could then be shared.

That's a natural idea, that doesn't seem to work. I think that Charles
Simonyi has put 10 years into Intentional Programming to follow similiar
ideas and they burned millions of $.

> With a built-in language translator, DataDraw would be much simpler than it is now.  However, with a built-in language translator, DataDraw becomes a language in itself.  What's unique about it?  Simple.  It's extendable by me and others I work with who are familiar with the DataDraw code base.  I can generate code of any type, and add literally any feature I wish.  However, I do that by directly editing the code generators, which are written in C and which link into DataDraw's database.  That's not elegant, or usable by anyone not familiar with the DataDraw code base, although it does cover my needs.

This is a certain way to solve problems but it may or may not be optimal.
The fact that you have this tool at hand gives power but may mislead.

> So, I've been looking into what it takes to get the same power, but in a language that anyone could work with.  In particular, I've been examining what it would take for D to cover DataDraw's functionality.

Analytically this is not a goal. The goal is to enable programmers to write great applications. What are their problems and how can they be solved?

> That, it turns out, is hard (which is one reason the XL compiler isn't done).  The more power you give the user, the more you open up the internals of the compiler, and the more complex you make the language.

I agree. I think this is the problem of C++ itself. Too much complexity
for to little gain.

> For example, to do that in D, a natural way would be to make Walter's
> representation of D as data structures part of the language definition
> (thus greatly restricting how D compilers are built).  Then, you could
> offer access to reflection classes at compile time (as OpenC++ does).  A
> natural way to use these classes at compile time is to interpret D code.
>   Now, you have to write a D interpreter as well as a compiler.  This is
> the aproach taken by VHDL for their generators, and it really
> complicated implementations of compilers.  An alternative is to
> re-compile the compiler instead.  This is a bit brain-bending, but I
> think getting rid of the interpreter is worth it.  Besides, I already
> recompile DataDraw every time I fix or add a feature, and that's never
> been much of a problem.
> 
> Even if we added compile-time reflection classes, I still don't get all the power of DataDraw, which I can extend in any way, because I directly edit the source.  What's still missing?
> 
> For one thing, reflection classes can't be used to add syntax to the language.  That's a serious limitation.  XL's aproach allows some syntax extension.  Scheme also has a nice mechanism.  However, both systems are limited, and complex, and slow.  I'm toying with another aproach that is easy if you already allow users to compile custom versions of the compiler (which you do to get rid of the interpreter).  Just provide a simple mechanism for generating a syntax description for use by bison. That nails the problem.  Any new syntax can then be added by a user, so long as it's compatible with what's already there.  A drawback is that bison now becomes part of the language, along with all its quirks and strong points.  At least bison is pretty much available everywhere.

I still don't know what problems you are trying to solve.

A language that is able to extend its own syntax? Surely an faszinating idea
but 99.9 percent of programmers would not be able to make good use of it.

> Just adding new syntax to the language doesn't get you all the way there.  You still are stuck with those reflection classes used to model the language.  If you have a new construct to implement, you can add the syntax, but what objects do you build to represent it?  The reflection classes themselves need to be extendable.  Really.  At that point, nothing in the language is left as non-configurable.  You're stuck with LAR1 parsers, but that's no big deal.
> 
> However, adding reflection classes is tricky.  Being C-derived, the language still needs to link with the C linker, including the compiler itself, especially if users are going to compile custom compilers for their applications.  That means that new types can't be added to the compiler's database, since C libraries are limited that way.  I'm currently toying with the age-old style of non-typed syntax trees rather than fully typed reflection classes.  It looks like it will work out, but in the end, all this has done is provide a compiler that's easy to extend.  It's easy to extend because it's parser, and internal data structures are simple, and extendable.  Plug-ins should be easy to write.  However, it's not really a standard language any more.  It's just a customizable compiler that's fairly easy to work with.
> 
> I'm left with the conclusion that D can't be enhanced be extendable the way XL wants to be, or the way I'd like D to be.

As I see it D was never designed to have an extensible syntax.

> I don't see how D can get there from here.

For this reason it is unreasonable to think it could go there.

Currently I don't understand why it should go there, other than it would allow you to carry your DataDraw methods of problem solving on to D.

But, as I said, I'll try to read some of your threads.

--
Helmut Leitner    leitner@hls.via.at Graz, Austria   www.hls-software.com
April 03, 2003
I agree with all your comments.

At this point, I'm not advocating major changes to D, so this reply is more just to answer your questions that to give Walter any ideas.  You'd asked about specific features I'd been advocating, so I'll re-summarize them below.

1) Compile-time reflection classes.  I threw this out there as a possibility to be investigated.  Now that I've done that, I'm dropping that request, for reasons described in the you replied to below.

2) I'd still like to see more powerful iterators that the ones discussed lately.  You can look up my recomendations under "Cool iterators", or something like that.

3) Dynamic class extensions are also a great thing, and it's sad C++, Java, C# and D don't have them.  Most programmers working with object databases have to emulate the extensions with cross-coupled void pointers.

4) A class framework inheritance mechnaism, such as Sather's "include" construct, virtual classes, or Dan's "Template Frameworks".  All of these cover a gaping hole in C++, but I'm concerned about the complexity of the virtual class aproach Walter was considering.

Embedded replies to a couple questions you posed are below.

Helmut Leitner wrote:
> 
> Bill Cox wrote:
> 
>>>There are surely better ways to advertise you project.
>>
>>...
>>I'm not trying to advertise DataDraw.  In fact, I'd love to see D
>>incorporate features that would allow me to kill it.  I'd prefer that
>>user's didn't start adopting DataDraw, as I don't have the time to do
>>free support.
> 
> 
> Ok, I think it's good to have this said.
> 
> 
>>It's open-source, as the copyright file describes.  The documentation sucks, and I think it will probably stay that way.
> 
> 
> That means its dead outside of the heads of its few experts and
> will remain so.
> 
> 
>>...
>>It's specific insights I've gained in working with DataDraw that I've
>>been trying to describe in this group, rather than trying to promote
>>DataDraw. ...
> 
> 
> I'm very interested in your experiences and insights. I'm doing
> software projects since 1979 and feel very strong about the way
> systems present themselves towards the programmer (APIs).
> 
> 
>>Through using DataDraw for many years, however, I think I've had some
>>fairly unique insights into language design.  Adding features to a
>>target langauge is what DataDraw is for, and I've been able to try out
>>several features not found in C++ in a real industrial coding
>>environment.  Some of those features I've described in other posts.
> 
> 
> I'll try to reread some of your postings and arguments. Can you give me some hints to find my way?
>>>As I said, I was hoping D could be extended to make DataDraw obsolete.
>>That turns out not to be the case.  I'll describe some of my current
>>thinking about this matter below.
>>
>>DataDraw currently just models data structures, and allows me to write
>>code generators.  This is much like the old OM tool for UML (which
>>DataDraw preceeds).  It gives me the power of compile-time reflection
>>classes, like those in OpenC++.  However, for each new language, or
>>coding style, I have to write a new code generator, and these things get
>>really complex.  DataDraw currenly has 5.  That kind of sucks.
>>
>>Instead, DataDraw should allow me to write one awesome code generator
>>that targets in an intermediate language.  Then, it should allow me to
>>write simple translators for each target language and coding style.  The
>>bulk of the work could then be shared.
> 
> 
> That's a natural idea, that doesn't seem to work. I think that Charles
> Simonyi has put 10 years into Intentional Programming to follow similiar
> ideas and they burned millions of $.

I believe it.  The hard part isn't making a nice intermediate language I can work with.  The hard part is making an extendable version that one anyone can work with.

>>With a built-in language translator, DataDraw would be much simpler than
>>it is now.  However, with a built-in language translator, DataDraw
>>becomes a language in itself.  What's unique about it?  Simple.  It's
>>extendable by me and others I work with who are familiar with the
>>DataDraw code base.  I can generate code of any type, and add literally
>>any feature I wish.  However, I do that by directly editing the code
>>generators, which are written in C and which link into DataDraw's
>>database.  That's not elegant, or usable by anyone not familiar with the
>>DataDraw code base, although it does cover my needs.
> 
> 
> This is a certain way to solve problems but it may or may not be optimal. The fact that you have this tool at hand gives power but may mislead.

You're right about that.  You have to be extremely careful about adding features to a language using a custom pre-processor.  In particular, every extension has to be carefully though out, and agreed to by the whole group.  If anyone could add a feature any time they wished, it'd result in mayhem.

>>So, I've been looking into what it takes to get the same power, but in a
>>language that anyone could work with.  In particular, I've been
>>examining what it would take for D to cover DataDraw's functionality.
> 
> 
> Analytically this is not a goal. The goal is to enable programmers to write great applications. What are their problems and how can they be solved? 

Oh, there are lots of problems.  Big stuff and little stuff.  How about array bounds checking in debug mode?  We added it to C.  Need a few fields added to existing classes at run-time?  We do that.  The space of solutions to real problems programmers are facing out there is a lot bigger than what most languages address.

I agree with your point, though.  A good D design is a design that covers most people's most common needs, but not all of anybody's needs.  IMO, D's basically on track.

>>That, it turns out, is hard (which is one reason the XL compiler isn't
>>done).  The more power you give the user, the more you open up the
>>internals of the compiler, and the more complex you make the language.
> 
> 
> I agree. I think this is the problem of C++ itself. Too much complexity
> for to little gain.
>  
> 
>>For example, to do that in D, a natural way would be to make Walter's
>>representation of D as data structures part of the language definition
>>(thus greatly restricting how D compilers are built).  Then, you could
>>offer access to reflection classes at compile time (as OpenC++ does).  A
>>natural way to use these classes at compile time is to interpret D code.
>>  Now, you have to write a D interpreter as well as a compiler.  This is
>>the aproach taken by VHDL for their generators, and it really
>>complicated implementations of compilers.  An alternative is to
>>re-compile the compiler instead.  This is a bit brain-bending, but I
>>think getting rid of the interpreter is worth it.  Besides, I already
>>recompile DataDraw every time I fix or add a feature, and that's never
>>been much of a problem.
>>
>>Even if we added compile-time reflection classes, I still don't get all
>>the power of DataDraw, which I can extend in any way, because I directly
>>edit the source.  What's still missing?
>>
>>For one thing, reflection classes can't be used to add syntax to the
>>language.  That's a serious limitation.  XL's aproach allows some syntax
>>extension.  Scheme also has a nice mechanism.  However, both systems are
>>limited, and complex, and slow.  I'm toying with another aproach that is
>>easy if you already allow users to compile custom versions of the
>>compiler (which you do to get rid of the interpreter).  Just provide a
>>simple mechanism for generating a syntax description for use by bison.
>>That nails the problem.  Any new syntax can then be added by a user, so
>>long as it's compatible with what's already there.  A drawback is that
>>bison now becomes part of the language, along with all its quirks and
>>strong points.  At least bison is pretty much available everywhere.
> 
> 
> I still don't know what problems you are trying to solve. 
> 
> A language that is able to extend its own syntax? Surely an faszinating idea but 99.9 percent of programmers would not be able to make good use of it.
>
You're right about how many programmers should use it.  It's dangerous stuff, and extensions need to be carefully considered by a few and then adopted by many.  Scheme has a nice mechanism for this kind of thing. Much of the syntax of Scheme can acutally be written in Scheme.

However, without an ability to add syntax, some new features can't cleanly be added to a language, and thus the language isn't fully extensible.  For example, how could we add Sather-like "include" constructs to allow module level inheritance?  There's no way in D, C++, Java, or C# to even say that.  To add this feature, you need to hack the parser a little.  After that, it's a simple thing to implement with compile-time reflection classes.

I'm not pushing for any syntax extension mechanism for D.  It's pretty worthless without some way to tie it into reflection classes or an equivalent mechanism.

>>Just adding new syntax to the language doesn't get you all the way
>>there.  You still are stuck with those reflection classes used to model
>>the language.  If you have a new construct to implement, you can add the
>>syntax, but what objects do you build to represent it?  The reflection
>>classes themselves need to be extendable.  Really.  At that point,
>>nothing in the language is left as non-configurable.  You're stuck with
>>LAR1 parsers, but that's no big deal.
>>
>>However, adding reflection classes is tricky.  Being C-derived, the
>>language still needs to link with the C linker, including the compiler
>>itself, especially if users are going to compile custom compilers for
>>their applications.  That means that new types can't be added to the
>>compiler's database, since C libraries are limited that way.  I'm
>>currently toying with the age-old style of non-typed syntax trees rather
>>than fully typed reflection classes.  It looks like it will work out,
>>but in the end, all this has done is provide a compiler that's easy to
>>extend.  It's easy to extend because it's parser, and internal data
>>structures are simple, and extendable.  Plug-ins should be easy to
>>write.  However, it's not really a standard language any more.  It's
>>just a customizable compiler that's fairly easy to work with.
>>
>>I'm left with the conclusion that D can't be enhanced be extendable the
>>way XL wants to be, or the way I'd like D to be.
> 
> 
> As I see it D was never designed to have an extensible syntax.  
> 
>>I don't see how D can get there from here.
> 
> 
> For this reason it is unreasonable to think it could go there.
> 
> Currently I don't understand why it should go there, other than it would allow you to carry your DataDraw methods of problem solving on to D. 
> 
> But, as I said, I'll try to read some of your threads.
> 
> --
> Helmut Leitner    leitner@hls.via.at   Graz, Austria   www.hls-software.com

I agree.  At this point, I've concluded that D should not try to solve the problems I solve with DataDraw.

I've started working on a new system that should replace DataDraw when finished.  It's already got the syntax extension mechanism I described that generates a bison file.  It's got a simple list based lanugage parse tree that is capable of representing any feature I wish to support.  These get used like compile-time reflection classes, allowing users to write code in the intermediate langauge in order to add features to the target language.  The output can be in any language (as with DataDraw), and users can write new generators to target new languages or coding styles.

I'm thinking of calling it Hack-C, since allowing me to hack in new features to C or other languages is it's primary function, and because the whole system seems like one of the world's largest hacks.  It's a translator that compiles application specific versions of itself in order to add features to other languages.  The opportunities for serious hacking in such a system are vast.

If you think there might be interest in this system in the open-source community, I could try to finish it's development that way.  It might be fun enough for me to actually support an open-source effort, and if anyone else were to help, I could benifit from that.

I haven't seen much interest in this kind of project out there in the past.  Languages are always hot, bot CASE tools never are.  Do you think this could be successful as an open-source effort?

Bill

April 03, 2003
Hmm... Mark, appreciating all your informedess and
very welcome sharp and clear view on this matter (and
others), how about improving your diplomatic skills
a bit?

Sorry about the noise.
The Luna Kid


April 10, 2003
Peter Hercek wrote:
> Well, I went through character and code page problems too about a year
>  ago. Very bad experience in C/C++ ... (I'm from place where 7 bits
>  is not enough). I have two points about this:

Me too :)

> 1) D should support characters and not bytes (8bits) or words (16bits);
>  when I'm indexing string I do so by characters and not by a byte multiply;
>  if I would want to index by eg bytes I would ask for string byte length and
>  cast to a byte array

Right.

> 2) Support for 3 character types (UTF8, UTF16, UTF32) is handy, but
>  not critical (can be solved by conversion functions); actually for one
>  character only, UTF32 has the shortest representation; it may be also
>  interesting not to be able to specify the the exact encoding for a string
>  (as oposed to an encoding for a character) - let's compiler to decide
>  what is the best representation (may be some optimization can be
>  achieved based on this later; eg compiler can decide to store strings
>  in partially balanced trees like STLPort does for ropes, but with
>  posibly different encodings for different nodes ... whatever just
>  writting down my thoughts)

UTF-32 doesn't have the shortest representation, since "in all 3 encodings [i.e. UTF-8/16/32] the maximim possible character representation length is 4 bytes", as the official description says. Though i agree that it's the most practical one, in part because working with an array of longs is nowadays faster than an array of shorts.

This is an implementation detail and should not matter though, because whatever string implementation is, it should hide the undelying complexity.

What matters though is that in UNICODE there are 2 kinds of characters -  normal and modifyers. So an "รค" can be represented as well as "a" and a special accent symbol. I'm pretty much sure you want to access these as a whole, not separately.

-i.

April 10, 2003
Walter wrote:
> That's only partially true - the downside comes from needing high
> performance you'll need byte indices, not UTF character strides. There is no
> getting away from the variable byte encoding. In my (limited) experience
> with string processing and UTF-8, rarely is it necessary to decode it. Most
> manipulation is done with indices.

Wait... won't language-supported iterators fix a need for accessing the
underlying array indices directly? I *definately* don't want to know
anything about underlying format, which can be really anything -
UTF-8/16/32, or even an agregate of 2 arrays like i or Mark have proposed.

Walter, you also don't: look what i found in this newsgroup. :)
And you claim it to be better to work with pointers into a char[], pretending it was an UTF-8 string!!!
--- 8< ---
At one time I had written a lexer that handled utf-8 source. It turned out to cause a lot of problems because strings could no longer be simply indexed by character position, nor could pointers be arbitrarilly incremented and decremented.

It turned out to be a lot of trouble  and I finally converted it to wchar's.
--- >8 ---


BTW, as to the possibilities that Mark wishes for himself, i've dug his message up, which was posted as i wasn't around yet. Here.

--- 8< ---
Short summaries here:

http://www.nmt.edu/tcc/help/lang/icon/positions.html
http://www.nmt.edu/tcc/help/lang/icon/substring.html
http://www.cs.arizona.edu/icon/docs/ipd266.htm
http://www.toolsofcomputing.com/IconHandbook/IconHandbook.pdf
Sections 6.2 and following.

Icon is simply unsurpassed in string processing and is for that reason famous among linguists.  There is more to the string processing than just character position indices.  Icon supports special clauses called "string scanning environments" which work like file i/o in a vague analogy.  (See third link
above, section 3.)

Icon also has nice built-in structures like sets (*character sets* turn out to be insanely useful), hash tables, and lists.  Somehow Icon never made it to the Big Leagues and that is a shame.  It deserves to be up there with Perl.  Icon is wicked fast when written correctly.

The Unicon project is the next-generation Icon, and has added objects and other modern features to base Icon.  It is on SourceForge.

(There was only one project in which I recall desiring a new Icon built-in.  I wanted a two-way hash table which could index off of either data column.  The workaround was to implement two mutually mirroring one-way hash tables.)

Icon has a very interesting 'success/failure' paradigm which might also be something to study, esp. in light of D's contract emphasis.  The unique 'goal-directed' paradigm is quite interesting but may have no application to D.

I have for a very long time desired Icon's string scanning capabilities in my C/C++ programs.  Even with std::string or string classes from various class libraries (I've used them all), there is just no comparison with Icon.  I would become a total D convert if it could do strings like Icon.

Mark

http://www.cs.arizona.edu/icon/
http://unicon.sourceforge.net/index.html
--- >8 ---


-i.

April 11, 2003

Ilya Minkov wrote:
> I have for a very long time desired Icon's string scanning capabilities in my C/C++ programs.  Even with std::string or string classes from various class libraries (I've used them all), there is just no comparison with Icon.  I would become a total D convert if it could do strings like Icon.

Being used to Perl, I think that the current D regex module has to be extended.

In what way does Icon differ (or have advantages) in string processing
compared to Perl?

-- 
Helmut Leitner    leitner@hls.via.at
Graz, Austria   www.hls-software.com
May 21, 2003
"Mark Evans" <Mark_member@pathlink.com> wrote in message news:b6beep$1qom$1@digitaldaemon.com...
> Maybe it will mend fences to say in public that UTF-32 could be dropped.
I have
> objective reasons for saying so, not vague unease: UTF-32 is rarely used
and
> truly fixed-width (so it can be 'faked' as Walter suggests).  Nonetheless intrinsic UTF-32 is just as reasonable to support as, say, the equally
rarely
> used, and equally fake-able 'ifloat' type.

My understanding is that the linux wchar_t type is UTF-32, which puts it in common use. UTF-32 is also handy as an intermediate form when converting between UTF-8 and UTF-16.


May 21, 2003
"Mark Evans" <Mark_member@pathlink.com> wrote in message news:b6bb6i$1ont$1@digitaldaemon.com...
> What would be nice is to make Unicode maximally simple and maximally
efficient
> for D users.

I appreciate the thought, but carrying around an extra array for each string seems difficult to make work, especially in view of slicing, etc. I don't think there's any way to design the language so it is both efficient at dealing with ordinary ascii, and transparently able to do multibytes.


May 21, 2003
"Sean L. Palmer" <palmer.sean@verizon.net> wrote in message news:b6bjg5$1ut5$1@digitaldaemon.com...
> "Matthew Wilson" <dmd@synesis.com.au> wrote in message news:b6bgt5$1sai$1@digitaldaemon.com...
> > This sounds like a nice idea - array of 1st-byte plus lookups. I'm
> intrigued
> > as to the nature of the lookup table. Is this a constant, process-wide, entity?
>
> No, because the map is indexed by the same index used to index into the
flat
> array.  Unless I'm misunderstanding something.

You could use a static 256 byte lookup table to give you the 'stride' to the next char.


May 21, 2003
"Mark Evans" <Mark_member@pathlink.com> wrote in message news:b6dolr$di3$1@digitaldaemon.com...
> You run into problems only with large UTF-8 strings that are frequently
passed
> to/from Unicode APIs.  Windows uses UTF-16 so it's no problem.  Where you
find
> UTF-8 happening is on the web, but that has inherent delays of its own, so
the
> cost might go unnoticed.  Consider for example that plenty of web sites
are
> driven with UTF-8 by languages far slower than D.

I've been looking at some books for programming CGI apps in C. I see the dreaded buffer overflow errors in the sample code even in highly regarded books. No wonder security is such a mess! Doing CGI in D would eliminate those problems.