Line spacing inconsistencies when pasting to Firefox from gedit

I was adding some productions to escena.org when I noticed something strange about the way the info section of the latest productions was looking. I was slightly distracted by the excitement of watching all those intros and demos that I hadn't heard before about, so it took me a while to notice what it was exactly that "didn't quite look right". It was the spacing!

Or more exactly, the double spacing that I was seeing, instead of just a single line as in the original!

Initially willing (or expecting) to put the blame on me, I examined my back-end code first, but there is nothing dealing with double spacing or "Windows spacing" correction, such as automatically replacing \r\n with \n only. I then remembered that when I developed this section a year ago I was using Chrome as my main browser, so I wondered if it was a Firefox-only error that had slipped thru back then. Therefore, I repeated my test with Chrome: opened a FILE_ID.DIZ with gedit, copied the text and pasted it to the corresponding textarea. Clicked on save... and the result came out OK! Simple single line-endings were received by PHP, as one would expect.

So it's a Firefox thing then, I thought. Not so fast, it can still get more bizarre. But before I get to that, let me show you a little piece of test code I wrote to ensure nothing else was interfering in the process:


<?php 

$txt = '';
$pasted_text = '';

if($_SERVER['REQUEST_METHOD'] == 'POST')
{
    $pasted_text = $_POST['text'];

    $text = str_replace("\n", '<span class="n"></span>' . "\n", $pasted_text);
    $text = str_replace("\r", '<span class="r"></span>' . "\r", $text);

    $txt = 'You pasted this:<br /><pre>' . $text . '</pre><br />(end of your paste)';
}

?><!DOCTYPE HTML>
<html>
    <head>
    </head>
    <body>
        <?php echo $txt; ?>
        <form action="" method="post">
        <textarea name="text" placeholder="PASTE SOMETHING HERE, press submit" rows="20" cols="80"><?php echo $pasted_text; ?></textarea><br />
        <input type="submit" value="submit" />
        </form>
        <style>
            pre {
                background: #eee;
                font-family: Monaco,"Andale Mono","Bitstream Mono",Courier,terminal;
                font-size: 12px;
                height: auto;
                line-height: 1;
                padding: 10px;
                width: auto;
                max-width: 99%;
            }

            .n {
                padding: 0 10px;
                background: #f00;
            }

            .r {
                padding: 0 10px;
                background: #0f0;
            }

        </style>
    </body>
</html>

(you can download it from here, in case you want to test it too)

What the test code does is simply highlight '\n' characters with red, and '\r' characters with green, so I can visually see what has been added. It's not exactly like a full-grown hex editor, but it's more practical for testing in the browser.

Then, if I use the following test text (please ignore the fact that some characters are garbled because of the encoding--though if you don't understand Spanish it should all look fine to you!):


Esta es una peque€a rutina realizada en Turbo Pascal 6.0.
Est  programada por motivo de una mini-competici¢n que hicimos en el area
SUR.DEMOS de SurNet.
Espero que os guste ;-)

When run with Firefox, I get this:

pasting to firefox

(notice the double spacing: there are double sequences of \n\rs)

And with Chrome it's fine:

pasting to chrome

And here comes the fun: when I copy the text from Chrome and paste it in Firefox. What do you think will happen? Will it double the space when I press Submit?

pasting to firefox, again

No! This time it works properly! Crazy! Or what? Well, I have made more tests to try to pinpoint what's happening.

Test one: copy from a txt file that I created on this computer (so it's all Linux line endings), paste into Firefox: WORKS. Paste into Chrome: WORKS.

Some guy reported (for another product) that copying from Gnome terminal or gedit to Firefox got him strange results, so I tested the Gnome terminal bit too. Works on Firefox and Chrome: no extra line endings.

Test two: I converted the original text file from Code Page 437 (the old MSDOS encoding) to UTF-8, and opened it with gedit, and pasted it into Firefox: DOESN'T WORK! Double line spacing again! I guess a \r is a \r is a \r, no matter what the encoding! (fortunately).

Test three: Opened the file with GVim. Copied everything and pasted into Firefox: WORKS. Single line spacing as expected.

Conclusion?

Before drawing up a conclusion, I'll summarise the facts:

  1. In all the tests that "fail", the copy-paste source is gedit, and the content is a file with MS-Dos line endings.
  2. It only fails with Firefox

So this is what I think is happening: gedit is getting confused by the MS-Dos line endings, and "something is wrong" in the data it puts in the clipboard when I press CTRL+C. Then Firefox gets sort of crazy when it gets those "wrong" characters or whatever gedit is storing in the clipboard, and tries to correct it... but ends up adding double the required line spacing.

On the other hand, Chrome seems to be able to "recover" from possibly malformed data, and proceeds in the normal way we expect: not duplicating content.

However I'm not too sure about this theory. I'd like to see the raw clipboard contents (this time, with a proper hex editor), to find out what is in there, but I haven't been able to find any program which does it. And I'm not sure I want to start digging X-Clipboard protocol RFC's or whatever it is that they use for their specs in order to find out how to communicate with the daemon which is listening to CTRL+C/CTRL+V key presses.

Do any of you have any pointer? Any idea? (other than telling people to not to use gedit!). A server-side solution is not exactly the best idea, since although I can detect whether the user is using Firefox, I don't have any way to detect if a text was pasted from gedit! And blindly replacing groups of \r\n\r\n with just \r\n won't work either: what if the text file indeed contains two carriage returns?

Decisions, decisions!

Update: found a gedit bug which might be related to this issue!