Jump to page: 1 25  
Page
Thread overview
New web newsreader - requesting participation
Jan 31, 2011
Adam Ruppe
Jan 31, 2011
Nick Sabalausky
Jan 31, 2011
Adam Ruppe
Jan 31, 2011
Trass3r
Jan 31, 2011
Adam Ruppe
Jan 31, 2011
Trass3r
Jan 31, 2011
Adam Ruppe
Jan 31, 2011
foobar
Jan 31, 2011
Daniel Gibson
Jan 31, 2011
Nick Sabalausky
Jan 31, 2011
Daniel Gibson
Jan 31, 2011
Adam Ruppe
Jan 31, 2011
foobar
Jan 31, 2011
Stephan Soller
Jan 31, 2011
Adam Ruppe
Jan 31, 2011
Trass3r
Jan 31, 2011
Andrej Mitrovic
Jan 31, 2011
Adam Ruppe
Feb 02, 2011
Stephan Soller
Jan 31, 2011
Walter Bright
Jan 31, 2011
Andrej Mitrovic
Feb 01, 2011
Gour
Feb 01, 2011
Trass3r
Feb 01, 2011
Nick Sabalausky
Feb 01, 2011
Denis Koroskin
Feb 01, 2011
Nick Sabalausky
Feb 01, 2011
Gour
Feb 01, 2011
Jesse Phillips
Feb 03, 2011
Eric Poggel
Feb 03, 2011
Adam Ruppe
Feb 03, 2011
Daniel Gibson
Feb 03, 2011
Jesse Phillips
Feb 07, 2011
Adam Ruppe
Feb 07, 2011
Adam Ruppe
Feb 01, 2011
Trass3r
Feb 01, 2011
dennis luehring
Feb 01, 2011
Adam Ruppe
Feb 01, 2011
dennis luehring
Feb 01, 2011
Andrej Mitrovic
Feb 01, 2011
Jesse Phillips
Feb 01, 2011
Trass3r
Feb 01, 2011
Nick Sabalausky
Feb 03, 2011
Kagamin
Feb 01, 2011
Andrew Wiley
Feb 01, 2011
Andrej Mitrovic
Feb 01, 2011
Andrew Wiley
January 31, 2011
In the other newsgroup, I've been talking about a little web news program I've been writing as a spinoff of the potential new homepage idea.

It's to the point where it is usuable, but still kinda buggy:

http://arsdnet.net/d-web-site/nntp/thread-index? newsgroup=digitalmars.D

Source code: http://arsdnet.net/d-web-site/nntp.d

NOTE: it does /not/ automatically check for new posts. I have to manually trigger that right now (I don't want it annoying the news server automatically while still in the testing phase.)

It will lazily load a message on demand though if you know it's message ID: http://arsdnet.net/d-web-site/nntp/get-message

Get it from the Message-ID header in the post.



Anyway, here's the features:

a) It isn't god awful slow. The PHP web news currently on digital mars, as best as I can tell, actually polls the news server every time you go to it's index! This does aggressive local caching.

b) It actually lets you select text...

OK, if I list every annoyance with the current web news, I'll never stop. Moving on to new things:

c) It tries to convert news posts to HTML, so the paragraphs wrap to the browser, links work, quotes are put into the proper tags for indentation, and it tries to auto-detect D code and put it in a <pre> block - which my javascript can make inline editable and runnable. Example:

http://arsdnet.net/d-web-site/nntp/get-message? newsgroup=digitalmars.D&messageId=% 3Cmailman.1085.1296409409.4748.digitalmars-d%40puremagic.com%3E

With script disabled, you'll see the code in a different colored block. With script enabled, you'll see an Edit button there too.

d) It tries to convert HTML emails back to plain text. (Ironically, so it can turn it back to html...) This gives uniformity across the various mime types. Similarly, if the type is multipart/alternative, it will only show the text version.

e) It also makes an attempt to preserve deliberate whitespace, for things like ASCII art or purposefully short lines. If it can't make heads or tails of it, it bails out and shows the original message in a <pre> block for human consumption.

f) Tries to be fast and lean.

g) Written in D!

h) Already read messages is tracked by your browser - if the link is visited, it puts up a different color url.

Coming as I find time:

a) References to bugzilla entries should be automatically converted to links.

b) Viewing threads by date or by threaded view.

c) Posting with the option of automatic quoting.

d) Syntax highlighting of D code in posts.

e) Maybe, maybe links to documentation of functions referenced,
   if I can find a good way to get them automatically. Integration
   with my dpldocs.info site is the way I'd do it.

e) Any more ideas? I'm reluctant to add too much, but if I like
   an idea - or if you want to write the code :) - I'll be open'
   to adding it.


Known bugs:

Lots of content types aren't handled right and it ignores character encoding.

It doesn't always recognize code. This would be ok, but if it sees one line as code but doesn't include one of them, it would confuse the reader. Example:

http://arsdnet.net/d-web-site/nntp/get-message? newsgroup=digitalmars.D&messageId=%3Cii4lbj%242bes%241% 40digitalmars.com%3E

(Look for "auto str =")

The reason for this is it detects code lines by looking for semicolons and open braces. It will call something a generic <pre> if there's a lot of whitespace in it - figuring it is probaby ascii art (if it thinks the whitespace has human significance, it tries to preserve it), but it still isn't a perfect detection function.

I'm open to ideas. We want to detect code, but not flag regular English text.



I'm also open to graphical styling ideas. I put up a dark
theme here because the white was hurting my eyes, but I change
on if I like light or dark almost at random. (Depends on the room's
lighting conditions I think). But I didn't do any more graphic
setup other than the max-width.

Multiple color schemes is an idea I like.



BTW, as a fun fact, this post is about 1/4th the size of the entire nntp.d code file!
January 31, 2011
"Adam Ruppe" <destructionator@gmail.com> wrote in message news:ii592i$c09$1@digitalmars.com...
>
> c) It tries to convert news posts to HTML, so the paragraphs wrap to the browser, links work, quotes are put into the proper tags for indentation, and it tries to auto-detect D code and put it in a <pre> block - which my javascript can make inline editable and runnable. Example:
>
> http://arsdnet.net/d-web-site/nntp/get-message? newsgroup=digitalmars.D&messageId=% 3Cmailman.1085.1296409409.4748.digitalmars-d%40puremagic.com%3E
>
> With script disabled, you'll see the code in a different colored block. With script enabled, you'll see an Edit button there too.
>

That's really cool.


> d) It tries to convert HTML emails back to plain text. (Ironically,
> so it can turn it back to html...)

I love that on so many different levels :)

> h) Already read messages is tracked by your browser - if the link is visited, it puts up a different color url.
>

It's amazing how often people seem to forget that feature exists. That was introduced in what, Mosaic?  Sometimes I think I'm the only one in the world who ever uses the "a:visited" CSS. Not that I feel strongly about it, but hey.


> It doesn't always recognize code. This would be ok, but if it sees one line as code but doesn't include one of them, it would confuse the reader. Example:
>
> http://arsdnet.net/d-web-site/nntp/get-message? newsgroup=digitalmars.D&messageId=%3Cii4lbj%242bes%241% 40digitalmars.com%3E
>
> (Look for "auto str =")
>

Ha! I broke your algorithm!

Oh, speaking of fuzzy detection algorithms, it seems to think that the "//" comment tokens are URLs (very, very short URLs ;) ).

> The reason for this is it detects code lines by looking for semicolons and open braces. It will call something a generic <pre> if there's a lot of whitespace in it - figuring it is probaby ascii art (if it thinks the whitespace has human significance, it tries to preserve it), but it still isn't a perfect detection function.
>
> I'm open to ideas. We want to detect code, but not flag regular English text.
>

One very rough idea: Take each paragraph (ie, each block of text that's separated by a full newline). Run it through a D lexer. If it has at most, say, 1 lexical error per line (on average), then assume it's intended as D code. If multiple consecutive paragraphs are flagged as D code, consider it them all part of the same code-block.

After all, D's supposed to be fast to lex (and to parse for that matter), and you'd only need to do it once and cache the result. Maybe it could even be tied into some syntax highlighting. Maybe use DDMD (we could use more people on DDMD anyway - Koroskin doesn't seem to have had time for it lately...neither have I for that matter...).

Actually, what could also be interesting would be an "english parser". Obviously true full-fledged english semantic processing is out-of-reach ATM, but I wonder if something could be made that acts "good enough" as a mere english-*detector*. Or a general natural-language-detector. Could be an interesting project at the very least.


>
> I'm also open to graphical styling ideas. I put up a dark
> theme here because the white was hurting my eyes, but I change
> on if I like light or dark almost at random. (Depends on the room's
> lighting conditions I think). But I didn't do any more graphic
> setup other than the max-width.
>

I like to use dark themes for my own stuff for the same reasons. But then I always end up going with bright-ish themes for public stuff because I know I'm in the minority on that. (I'm not really trying to suggest one way or another, just commenting.)

>
> BTW, as a fun fact, this post is about 1/4th the size of the entire nntp.d code file!

Viva la D!


January 31, 2011
OT:

> c) It tries to convert news posts to HTML, so the paragraphs wrap to the browser, links work, quotes are put into the proper tags for indentation, and it tries to auto-detect D code and put it in a <pre> block - which my javascript can make inline editable and runnable. Example:
> 
> http://arsdnet.net/d-web-site/nntp/get-message? newsgroup=digitalmars.D&messageId=% 3Cmailman.1085.1296409409.4748.digitalmars-d%40puremagic.com%3E

I accidentally used http://arsdnet.net/d-web-site/nntp/get-message?%20%20newsgroup=digitalmars.D&messageId=%3Cmailman.1085.1296409409.4748.digitalmars-d%40puremagic.com%3E

(note the %20%20 before newsgroup)

So it showed me some Get Message form with <mailman.1085.1296409409.4748.digitalmars-d@puremagic.com> in the message id field.

If I click on Get Message then:

object.Exception: invalid newsgroup
----------------
/var/www/htdocs/d-web-site/nntp(immutable(char)[] nntp.sanitizeNewsgroupName(immutable(char)[])) [0x80ba73b]
/var/www/htdocs/d-web-site/nntp(arsd nntp.Newsreader.getMessage(immutable(char)[], immutable(char)[])) [0x80b84a8]
/var/www/htdocs/d-web-site/nntp(_D4arsd3web42__T17prepareReflectionTS4nntp10NewsreaderZ17prepareReflectionFC4arsd3cgi3CgiZPS4arsd3web14ReflectionInfo1499__T15generateWrapperS1425_D4nntp10Newsreader10getMessageFAyaAyaZC4arsd8database1349__T16SimpleDataObjectVAyaa5_706f737473TS4arsd8database1267__T21StructFromCreateTableVAyaa607_0a09435245415445205441424c4520706f73747320280a09092d2d20616c6c206f6620746865736520617265204d6573736167652d49442076616c7565730a09092d2d204649584d453a2074686973206973206c6961626c6520746f20626520736c6f6f6f6f6f6f77206173207468652064622067726f77730a09096d6573736167654964205641524348415228363029205052494d415259204b45592c0a0909696e5265706c79546f2056415243484152283630292c0a0909746872656164526f6f742056415243484152283630292c0a0a09092d2d20746865206e756d65726963206964656e7469666965722c206966207765206b6e6f772069740a090961727469636c65496420494e54454745522c0a0a090964617465506f7374656420424947494e54204e4f54204e554c4c2c0a0a09096e65777367726f7570205641524348415228343029204e4f54204e554c4c2c0a0a0909617574686f72205641524348415228383029204e4f54204e554c4c2c0a09097375626a65637420564152434841522831323029204e4f54204e554c4c2c0a0a09096d65737361676520544558540a092920454e47494e453d496e6e6f44422044454641554c5420434841525345543d757466383b0a0a09435245415445205441424c45206173736f727465645f6461746120280a0909696420494e5445474552204155544f5f494e4352454d454e542c0a0a09096e616d652056415243484152283830292c0a090976616c75652056415243484152283830292c0a0a09095052494d415259204b4559286964290a09292044454641554c5420434841525345543d757466383b0aVAyaa5_706f737473Z21StructFromCreateTableZ16SimpleDataObjectTS4nntp10NewsreaderVxAyaa10_6765744d657373616765Z15generateWrapperMFPS4arsd3web14ReflectionInfoZDFC4arsd3cgi3CgixHAyaAAyaxAyaZAya7wrapperMFC4arsd3cgi3CgixHAyaAAyaxAyaZAya+0x1dd) [0x80be21d]
/var/www/htdocs/d-web-site/nntp(_D4arsd3web3runFC4arsd3cgi3CgiPS4arsd3web14ReflectionInfoZv+0x384) [0x80bca68]
/var/www/htdocs/d-web-site/nntp(_Dmain+0x2b) [0x80b9b33]
/var/www/htdocs/d-web-site/nntp(extern (C) int rt.dmain2.main(int, char**)) [0x80e4a36]
/var/www/htdocs/d-web-site/nntp(extern (C) int rt.dmain2.main(int, char**)) [0x80e4990]
/var/www/htdocs/d-web-site/nntp(extern (C) int rt.dmain2.main(int, char**)) [0x80e4a7a]
/var/www/htdocs/d-web-site/nntp(extern (C) int rt.dmain2.main(int, char**)) [0x80e4990]
/var/www/htdocs/d-web-site/nntp(main+0x96) [0x80e4936]
/lib/libc.so.6(__libc_start_main+0xe6) [0xf741db86]
/var/www/htdocs/d-web-site/nntp() [0x80b8291]


Strange thing is, most functions are properly demangled but 2 aren't.
Is this a (known) bug?
January 31, 2011
Adam Ruppe Wrote:

> In the other newsgroup, I've been talking about a little web news program I've been writing as a spinoff of the potential new homepage idea.
> 
> It's to the point where it is usuable, but still kinda buggy:
> 
> http://arsdnet.net/d-web-site/nntp/thread-index? newsgroup=digitalmars.D
> 
> Source code: http://arsdnet.net/d-web-site/nntp.d
> 
> NOTE: it does /not/ automatically check for new posts. I have to manually trigger that right now (I don't want it annoying the news server automatically while still in the testing phase.)
> 
> It will lazily load a message on demand though if you know it's message ID: http://arsdnet.net/d-web-site/nntp/get-message
> 
> Get it from the Message-ID header in the post.
> 
> 
> 
> Anyway, here's the features:
> 
> a) It isn't god awful slow. The PHP web news currently on digital mars, as best as I can tell, actually polls the news server every time you go to it's index! This does aggressive local caching.
> 
> b) It actually lets you select text...
> 
> OK, if I list every annoyance with the current web news, I'll never stop. Moving on to new things:
> 
> c) It tries to convert news posts to HTML, so the paragraphs wrap to the browser, links work, quotes are put into the proper tags for indentation, and it tries to auto-detect D code and put it in a <pre> block - which my javascript can make inline editable and runnable. Example:
> 
> http://arsdnet.net/d-web-site/nntp/get-message? newsgroup=digitalmars.D&messageId=% 3Cmailman.1085.1296409409.4748.digitalmars-d%40puremagic.com%3E
> 
> With script disabled, you'll see the code in a different colored block. With script enabled, you'll see an Edit button there too.
> 
> d) It tries to convert HTML emails back to plain text. (Ironically, so it can turn it back to html...) This gives uniformity across the various mime types. Similarly, if the type is multipart/alternative, it will only show the text version.
> 
> e) It also makes an attempt to preserve deliberate whitespace, for things like ASCII art or purposefully short lines. If it can't make heads or tails of it, it bails out and shows the original message in a <pre> block for human consumption.
> 
> f) Tries to be fast and lean.
> 
> g) Written in D!
> 
> h) Already read messages is tracked by your browser - if the link is visited, it puts up a different color url.
> 
> Coming as I find time:
> 
> a) References to bugzilla entries should be automatically converted to links.
> 
> b) Viewing threads by date or by threaded view.
> 
> c) Posting with the option of automatic quoting.
> 
> d) Syntax highlighting of D code in posts.
> 
> e) Maybe, maybe links to documentation of functions referenced,
>    if I can find a good way to get them automatically. Integration
>    with my dpldocs.info site is the way I'd do it.
> 
> e) Any more ideas? I'm reluctant to add too much, but if I like
>    an idea - or if you want to write the code :) - I'll be open'
>    to adding it.
> 
> 
> Known bugs:
> 
> Lots of content types aren't handled right and it ignores character encoding.
> 
> It doesn't always recognize code. This would be ok, but if it sees one line as code but doesn't include one of them, it would confuse the reader. Example:
> 
> http://arsdnet.net/d-web-site/nntp/get-message? newsgroup=digitalmars.D&messageId=%3Cii4lbj%242bes%241% 40digitalmars.com%3E
> 
> (Look for "auto str =")
> 
> The reason for this is it detects code lines by looking for semicolons and open braces. It will call something a generic <pre> if there's a lot of whitespace in it - figuring it is probaby ascii art (if it thinks the whitespace has human significance, it tries to preserve it), but it still isn't a perfect detection function.
> 
> I'm open to ideas. We want to detect code, but not flag regular English text.
> 
> 
> 
> I'm also open to graphical styling ideas. I put up a dark
> theme here because the white was hurting my eyes, but I change
> on if I like light or dark almost at random. (Depends on the room's
> lighting conditions I think). But I didn't do any more graphic
> setup other than the max-width.
> 
> Multiple color schemes is an idea I like.
> 
> 
> 
> BTW, as a fun fact, this post is about 1/4th the size of the entire nntp.d code file!

This is great work! looks SO much better than what we have right now.

I'd implement the following filters/parsers for text posts:
1. common human markup such as: _foo_ (underline), *foo* (bold) etc,
2. parse BBCode.

The NG could standardize on BBcode or some other light-weight marking going forward to make this even more straight forward.

January 31, 2011
Am 31.01.2011 11:25, schrieb foobar:
> Adam Ruppe Wrote:
>
>> In the other newsgroup, I've been talking about a little
>> web news program I've been writing as a spinoff of the
>> potential new homepage idea.
>>
>> It's to the point where it is usuable, but still kinda buggy:
>>
>> http://arsdnet.net/d-web-site/nntp/thread-index?
>> newsgroup=digitalmars.D
>>
>> Source code: http://arsdnet.net/d-web-site/nntp.d
>>
>> NOTE: it does /not/ automatically check for new posts. I have
>> to manually trigger that right now (I don't want it annoying
>> the news server automatically while still in the testing phase.)
>>
>> It will lazily load a message on demand though if you know
>> it's message ID:
>> http://arsdnet.net/d-web-site/nntp/get-message
>>
>> Get it from the Message-ID header in the post.
>>
>>
>>
>> Anyway, here's the features:
>>
>> a) It isn't god awful slow. The PHP web news currently on digital
>> mars, as best as I can tell, actually polls the news server every
>> time you go to it's index! This does aggressive local caching.
>>
>> b) It actually lets you select text...
>>
>> OK, if I list every annoyance with the current web news, I'll
>> never stop. Moving on to new things:
>>
>> c) It tries to convert news posts to HTML, so the paragraphs
>> wrap to the browser, links work, quotes are put into the proper
>> tags for indentation, and it tries to auto-detect D code and
>> put it in a<pre>  block - which my javascript can make inline
>> editable and runnable. Example:
>>
>> http://arsdnet.net/d-web-site/nntp/get-message?
>> newsgroup=digitalmars.D&messageId=%
>> 3Cmailman.1085.1296409409.4748.digitalmars-d%40puremagic.com%3E
>>
>> With script disabled, you'll see the code in a different colored
>> block. With script enabled, you'll see an Edit button there
>> too.
>>
>> d) It tries to convert HTML emails back to plain text. (Ironically,
>> so it can turn it back to html...) This gives uniformity across
>> the various mime types. Similarly, if the type is
>> multipart/alternative, it will only show the text version.
>>
>> e) It also makes an attempt to preserve deliberate whitespace,
>> for things like ASCII art or purposefully short lines. If it
>> can't make heads or tails of it, it bails out and shows the
>> original message in a<pre>  block for human consumption.
>>
>> f) Tries to be fast and lean.
>>
>> g) Written in D!
>>
>> h) Already read messages is tracked by your browser - if the link
>> is visited, it puts up a different color url.
>>
>> Coming as I find time:
>>
>> a) References to bugzilla entries should be automatically
>> converted to links.
>>
>> b) Viewing threads by date or by threaded view.
>>
>> c) Posting with the option of automatic quoting.
>>
>> d) Syntax highlighting of D code in posts.
>>
>> e) Maybe, maybe links to documentation of functions referenced,
>>     if I can find a good way to get them automatically. Integration
>>     with my dpldocs.info site is the way I'd do it.
>>
>> e) Any more ideas? I'm reluctant to add too much, but if I like
>>     an idea - or if you want to write the code :) - I'll be open'
>>     to adding it.
>>
>>
>> Known bugs:
>>
>> Lots of content types aren't handled right and it ignores
>> character encoding.
>>
>> It doesn't always recognize code. This would be ok, but if it
>> sees one line as code but doesn't include one of them, it would
>> confuse the reader. Example:
>>
>> http://arsdnet.net/d-web-site/nntp/get-message?
>> newsgroup=digitalmars.D&messageId=%3Cii4lbj%242bes%241%
>> 40digitalmars.com%3E
>>
>> (Look for "auto str =")
>>
>> The reason for this is it detects code lines by looking for
>> semicolons and open braces. It will call something a generic
>> <pre>  if there's a lot of whitespace in it - figuring it is
>> probaby ascii art (if it thinks the whitespace has human
>> significance, it tries to preserve it), but it still isn't
>> a perfect detection function.
>>
>> I'm open to ideas. We want to detect code, but not flag
>> regular English text.
>>
>>
>>
>> I'm also open to graphical styling ideas. I put up a dark
>> theme here because the white was hurting my eyes, but I change
>> on if I like light or dark almost at random. (Depends on the room's
>> lighting conditions I think). But I didn't do any more graphic
>> setup other than the max-width.
>>
>> Multiple color schemes is an idea I like.
>>
>>
>>
>> BTW, as a fun fact, this post is about 1/4th the size of the
>> entire nntp.d code file!
>
> This is great work! looks SO much better than what we have right now.
>
> I'd implement the following filters/parsers for text posts:
> 1. common human markup such as: _foo_ (underline), *foo* (bold) etc,
> 2. parse BBCode.
>
> The NG could standardize on BBcode or some other light-weight marking going forward to make this even more straight forward.
>

No BBcode please, I'd still like to be able to (properly) view the posts in Thunderbird.
Else we could entirely switch to phpBB or something like that instead of using a nntp server.
January 31, 2011
"foobar" <foo@bar.com> wrote in message news:ii62n0$1r3i$1@digitalmars.com...
>
> 1. common human markup such as: _foo_ (underline), *foo* (bold) etc,
>

I've never been much of a fan of that. Actually that's one of the things I didn't like about Thunderbird when I tried it: it kept replacing *'s and _'s with formatting even what I was in the supposed "plaintext" mode.

Of course, if the *'s and _'s stay intact when the text is bolded/etc, then I can't say I'd care quite so much.


January 31, 2011
Am 31.01.2011 13:19, schrieb Nick Sabalausky:
> "foobar"<foo@bar.com>  wrote in message
> news:ii62n0$1r3i$1@digitalmars.com...
>>
>> 1. common human markup such as: _foo_ (underline), *foo* (bold) etc,
>>
>
> I've never been much of a fan of that. Actually that's one of the things I
> didn't like about Thunderbird when I tried it: it kept replacing *'s and _'s
> with formatting even what I was in the supposed "plaintext" mode.
>
> Of course, if the *'s and _'s stay intact when the text is bolded/etc, then
> I can't say I'd care quite so much.
>

This is exactly what my Thunderbird (okay, it's Icedove really) does. *,_,/ stays intact, but the text is bolded/underlined/italicized.

January 31, 2011
Nick Sabalausky wrote:
> It's amazing how often people seem to forget [a:visited] exists.

Yeah, it boggles my mind - I personally find it incredibly useful. But every design I get for clients invariably has visited colors purposefully indistinguishable from regular links.

Other things that break it for a lot of people is URLs randomly change ever so slightly, or don't change at all, which throws a wrench in caching too.

I blame AJAX. (cue someone saying "ajax doesn't need to break it!
yeah, I know.)


Speaking of caching, that's something I want to work here, but there's one problem with that: checking for replies means the page's contents might actually change.

I figure I'll set the cache expires date to coincide with the next newnews check. New posts won't show up immediately anywhere, but it'll be a little faster to navigate around in the mean time. (I'm thinking about a 30 minute check on .D and .learn, and a one hour check on .announce, since it's slower moving anyway.)


> Oh, speaking of fuzzy detection algorithms, it seems to think
> that the "//"
> comment tokens are URLs (very, very short URLs ;) ).

Yea, looks like std.regex.url kinda sucks. It flagged that, but
it didn't match paths in website links. (Maybe I'm doing it wrong?)


> One very rough idea: Take each paragraph (ie, each block of text that's separated by a full newline). Run it through a D lexer. If it has at most, say, 1 lexical error per line (on average), then assume it's intended as D code.

I don't think that will work because a lot of regular sentences would register as a series of variable names. It'd probably have to try at least a rudimentary parse.

(For comparison, consider a jumble of English words. Each piece is a word, so no problem there, but without understanding what they mean, you can't tell if it is a meaningful sentence or not.)


> Actually, what could also be interesting would be an "english parser". Obviously true full-fledged english semantic processing is out-of-reach ATM, but I wonder if something could be made that acts "good enough" as a mere english-*detector*. Or a general natural-language-detector.

I did put a very primitive thing like this in there: it checks for ". " when guessing if it's code or not when not sure. My reasoning is that while periods are common in both, in code it is usually followed by a method name, whereas in English, we usually put a space in there.

I sometimes write ".\n" in code, but ". " is pretty rare in my own usage, outside comments.


Another thing I considered was to check the frequency of capitalized words vs punctuation, or for balanced brackets and stuff like that. Natural language uses a lot of capital letters right after spaces. Code is more likely to be camelCased. There's some crossover ("McDonald's" could flag either way), but looking for bizarre symbols like parens, operators, etc. might disambiguate it.

However, "line[$-1] == ';'" and friends were so much simpler and so far, seem to give good enough results, so I let it stay at that.
January 31, 2011
Trass3r wrote:
> So it showed me some Get Message form with <mailman.1085.1296409409.4748.digitalmars-d@puremagic.com> in the message id field.

That, by the way, is one of the background features of web.d. If there's insufficient parameters to call a function ("  newsgroup" != "newsgroup" so it thought it wasn't an argument to the function) it automatically generates a form based on the func's args, auto- fills what it knows, and lets you fill in the rest.

The idea there was to define a basic website by doing nothing more than listing some function prototypes. While I find it pretty cool, it's "one size fits all" approach is actually fairly useless in practice, alas.


Anyway:

> Strange thing is, most functions are properly demangled but 2
> aren't.
> Is this a (known) bug?

Yes, core.demangle can't do some symbols because DMD applies
a one-way hash to them once they reach a certain length because
such long symbols tend to break linkers.
January 31, 2011
foobar wrote:
> 1. common human markup such as: _foo_ (underline), *foo* (bold) etc,

Yeah, that's a pretty good idea. I agree with the others that it should keep the text symbols (especially since I've seen these algorithms wrongly flag things *a lot*) but a basic implementation is ok.

> 2. parse BBCode.

This probably isn't a good idea... unless it is a web input only filter.

So posts pulled off the news server are treated as plain text - no BBCode parsing is attempted. But posts made through the website may be parsed, and converted to plain text before being forwarded to the news server. (Note that I use my beloved mutt mail client for reading the newsgroups myself, so anything that would break plain text email browsing is a no.)

I already have pretty decent bbcode -> html and html -> text functions in my bag of toys, so regular participants never need to know what kind of input was used.

It would let web users feel more at home without impacting everyone else.


The only downside I see is if people think bbcode is accepted, someone might write it in their newsreader or email client, where it won't be parsed. I don't want the groups to get filled up with bizarre markup everywhere, but, the kind of users who use email clients and newsreaders probably won't make that mistake anyway.


So yeah, let's give it a try for web posting and see if it works out.
« First   ‹ Prev
1 2 3 4 5