Announcement

Collapse
No announcement yet.

Open-Source HTML5 Terminal Emulator To Support X11

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Originally posted by krach View Post
    Oh, also the demo server is not reachable via https, only http.
    That is correct. I'm not using HTTPS with the demo for three reasons:

    1) There's nothing sensitive/private. It doesn't let you SSH and it doesn't take passwords. It's just a tech demo.
    2) I don't feel that buying an SSL certificate to demonstrate the product is necessary (people can freak out when they are presented with the, "Are you sure you wish to proceed?" interstitial page).
    3) There's a lot of (big) businesses that use proxies that don't work with SSL WebSockets (initially). BlueCoat is to blame for most of this with their "speed bump" mechanism that doesn't know the difference between a WebSocket request and a regular HTTP GET. Worst of all, when the proxy interferes like that the browser won't report a proper error (that I can deal with in the code). It just sort of "never finishes connecting... Forever."

    Comment


    • #47
      Originally posted by fthiery View Post
      Hi there ; how does your X11 implementation work then ? Are you sending compressed video or real X11 transport over ssh, then display with a js based X11 implementation ?
      How it works at a low-level is quite complicated due to a lot of asynchronous/multiprocess code. Having said that, here's a high-level overview:

      Gate One connects to the X11 server using XCB and gives it a list of events it wishes to be notified of. These events are the usual stuff you'd expect like, "Window XYZ was just resized and here are the new dimensions" or, "Window ABC just closed." It also asks the X11 server to report "damage" events which are reports of, "what just changed" inside any given window or region of the display.

      Damage events include coordinates and dimensions in the form of: X, Y, width, height. These are used by Gate One to grab a screenshot of that exact region of the window or the display (depending on how you've got it configured). Once it's got the raw image data it converts it to something the browser will understand (encoding) such as a PNG or JPEG image. Once the image data is converted it is sent to the client over the WebSocket. The client (browser) then draws the image on a canvas element and the user's view of that window is updated.

      Things like mouse clicks and keystrokes work in a similar fashion to how they do with Gate One terminals... Keydown->Gate One server->program.

      There's no "video encoding" going on. It's just screenshots of windows. It just happens (thanks to a lot of hard work) that Gate One can capture, encode, and send screenshots fast enough that playing back a video is possible.

      Hopefully that answers your question?

      Comment


      • #48
        @ ssokolow, mikeserv, riskable

        Ok. Thanks for enlightening me. I was indeed too quick to jump to conclusions. Sorry for that!

        Comment


        • #49
          Originally posted by riskable View Post
          ...Gate One connects to the X11 server using XCB and gives it a list of events it wishes to be notified of. These events are the usual stuff you'd expect like, "Window XYZ was just resized and here are the new dimensions" or, "Window ABC just closed." It also asks the X11 server to report "damage" events which are reports of, "what just changed" inside any given window or region of the display.

          Damage events include coordinates and dimensions in the form of: X, Y, width, height. These are used by Gate One to grab a screenshot of that exact region of the window or the display (depending on how you've got it configured)...
          Ok, so dirty region updates. As I understand it (in an admittedly limited capacity), this is generally very efficient (and already a standard practice for most remote display protocols) at typical gui window updates and the like but can severely degrade full-fps video transmission, especially as the video display window-size increases and/or fast-moving scenes in the video increase the area per frame that needs updating.

          This also presents a couple of hurdles that I can imagine:

          1) Unless the region capture is performed at a frequency equal to that of the video framerate then it seems video tearing should be expected to occur at least to some extent pretty much consistently for frames not nearly identical to those preceeding them.

          2) To my knowledge (which, again, is prone to err) nearly any video encoded in a codec seriously updated in the past decade or so is likely to have already incorporated similar dirty-region compression methods. Recompressing in this way must be mostly redundant, yes? It seems it could also exaggerate imperfections in the original compression (e.g.: pixellation artifacts presented onscreen when the decompressed video is played on the server display would register damage events, be captured, converted to JPEG/PNG, transmitted to and then processed by clients' browsers by your protocol, each of which steps could potentially introduce further degradation of quality, I suppose). Have you observed anything like this or am I way off base here?

          Originally posted by riskable View Post
          Once it's got the raw image data it converts it to something the browser will understand (encoding) such as a PNG or JPEG image. Once the image data is converted it is sent to the client over the WebSocket. The client (browser) then draws the image on a canvas element and the user's view of that window is updated.
          Do you by any chance distinguish between window elements when capturing/drawing the screens? By this I mean, is it possible to assign each window element its own canvas element, or do you do so already? Wikipedia indicates that the XCB connection you mentioned should provide the capability of distinguishing between the disparate windows on the server-side, allowing for simple compression tricks like only accepting damage event updates from X's reportedly active window for instance. If assigned their own canvasses, it could be possible to implement things things like client-side-only window controls which might, for example, take advantage of browser-specific features such as Chrome's canvas to panel/native window flags, enabling a sort of seamless desktop integration.

          Originally posted by riskable View Post
          Things like mouse clicks and keystrokes work in a similar fashion to how they do with Gate One terminals... Keydown->Gate One server->program.

          There's no "video encoding" going on. It's just screenshots of windows. It just happens (thanks to a lot of hard work) that Gate One can capture, encode, and send screenshots fast enough that playing back a video is possible...
          This is kind of what I was driving at above, and I'm especially curious how what you're doing is more capable of streaming video than what most other protocols can reliably do without incorporating special forwarding extensions, despite your description of the process sounding more or less in line with standard practice. Could it instead be possible to recognize multimedia elements and pass them through directly via WebM?

          I apologize if this is all way off the mark, I kinda suck at programming myself so I don't always (read: pretty much never) know what I'm talking about, but I do like to read and do I can sometimes acheive a limited conceptual understanding of the inner-workings of things. I'm mostly just interested in reading some more, is all.

          Also, I immediately thought of virtual applications when when I first read this article. Could it be possible to do compositing/rendering client-side with multiple servers running in headless vms? Sort of a reverse to the intended paradigm, maybe, but how cool would it be to switch browser tabs and thereby switch OS's?

          And last, haven't you done enough? I mean, don't take this the wrong way, but GateOne is already so feature-rich, incorporating so much of what you call "polish," and has become so in such a short-time with no apparent sacrifice in documentation or configurability (to the contrary, I found both a little overwhelming a few months ago when i first tried to set it up)... Don't you do this by yourself, man? How do you find the time to do it all and keep the bugs down? How do you keep it all in your head?

          Anyway, best of luck with it all, seriously. Can't wait to try it.

          -Mike

          Comment


          • #50
            Originally posted by riskable View Post
            ...Gate One connects to the X11 server using XCB and gives it a list of events it wishes to be notified of. These events are the usual stuff you'd expect like, "Window XYZ was just resized and here are the new dimensions" or, "Window ABC just closed." It also asks the X11 server to report "damage" events which are reports of, "what just changed" inside any given window or region of the display.

            Damage events include coordinates and dimensions in the form of: X, Y, width, height. These are used by Gate One to grab a screenshot of that exact region of the window or the display (depending on how you've got it configured)...
            Ok, so dirty region updates. As I understand it (in an admittedly limited capacity), this is generally very efficient (and already a standard practice for most remote display protocols) at typical gui window updates and the like but can severely degrade full-fps video transmission, especially as the video display window-size increases and/or fast-moving scenes in the video increase the area per frame that needs updating.

            This also presents a couple of hurdles that I can imagine:

            1) Unless the region capture is performed at a frequency equal to that of the video framerate then it seems video tearing should be expected to occur at least to some extent pretty much consistently for frames not nearly identical to those preceeding them.

            2) To my knowledge (which, again, is prone to err) nearly any video encoded in a codec seriously updated in the past decade or so is likely to have already incorporated similar dirty-region compression methods. Recompressing in this way must be mostly redundant, yes? It seems it could also exaggerate imperfections in the original compression (e.g.: pixellation artifacts presented onscreen when the decompressed video is played on the server display would register damage events, be captured, converted to JPEG/PNG, transmitted to and then processed by clients' browsers by your protocol, each of which steps could potentially introduce further degradation of quality, I suppose).

            Have you observed anything like this or am I way off base here?

            Originally posted by riskable View Post
            Once it's got the raw image data it converts it to something the browser will understand (encoding) such as a PNG or JPEG image. Once the image data is converted it is sent to the client over the WebSocket. The client (browser) then draws the image on a canvas element and the user's view of that window is updated.
            Do you by any chance distinguish between window elements when capturing/drawing the screens? By this I mean, is it possible to assign each window element its own canvas element, or do you do so already? Wikipedia indicates that the XCB connection you mentioned should provide the capability of distinguishing between the disparate windows on the server-side, allowing for simple compression tricks like only accepting damage event updates from X's reportedly active window for instance. If assigned their own canvasses, it could be possible to implement things things like client-side-only window controls which might, for example, take advantage of browser-specific features such as Chrome's canvas to panel/native window flags, enabling a sort of seamless desktop integration.

            Originally posted by riskable View Post
            Things like mouse clicks and keystrokes work in a similar fashion to how they do with Gate One terminals... Keydown->Gate One server->program.

            There's no "video encoding" going on. It's just screenshots of windows. It just happens (thanks to a lot of hard work) that Gate One can capture, encode, and send screenshots fast enough that playing back a video is possible...
            This is kind of what I was driving at above, and I'm especially curious how what you're doing is more capable of streaming video than what most other protocols can reliably do without incorporating special forwarding extensions, despite your description of the process sounding more or less in line with standard practice. Could it instead be possible to recognize multimedia elements and pass them through directly via WebRTC/WebM?

            I apologize if this is all way off the mark, I kinda suck at programming myself so I don't always (read: pretty much never) know what I'm talking about, but I do like to read and do I can sometimes acheive a limited conceptual understanding of the inner-workings of things. I'm mostly just interested in reading some more, is all.

            Also, I immediately thought of virtual applications when when I first read this article. Could it be possible to do compositing/rendering client-side with multiple servers running in headless vms? Sort of a reverse to the intended paradigm, maybe, but how cool would it be to switch browser tabs and thereby switch OS's?

            And last, haven't you done enough? I mean, don't take this the wrong way, but GateOne is already so feature-rich, incorporating so much of what you call "polish," and has become so in such a short-time with no apparent sacrifice in documentation or configurability (to the contrary, I found both a little overwhelming a few months ago when i first tried to set it up)... Don't you do this by yourself, man? How do you find the time to do it all and keep the bugs down? How do you keep it all in your head?

            Anyway, best of luck with it all, seriously. Can't wait to try it.

            -Mike

            Comment


            • #51
              Originally posted by riskable View Post
              Hey there... I'm the author of Gate One. If you guys have any specific questions about it or the new X11 support just ask away.
              What kind of performance can we expect on something with a weaker CPU, like for example, an MSM8660?

              Comment


              • #52
                Originally posted by mikeserv View Post
                Ok, so dirty region updates. As I understand it (in an admittedly limited capacity), this is generally very efficient (and already a standard practice for most remote display protocols) at typical gui window updates and the like but can severely degrade full-fps video transmission, especially as the video display window-size increases and/or fast-moving scenes in the video increase the area per frame that needs updating.
                That is correct: The bigger the window the more pixels it has to send. This is something I'm trying to work around, actually. If a window is presenting damage at a fast enough rate I have it automatically switch to JPEG compression. When that happens I am thinking about setting a flag on that window indicating that the screenshot should be scaled down a bit (say, to a pre-configured setting such as 720p) and then up-scaled at the client. I have no idea what kind of performance impact this would have but it would certainly solve the problem of, "bigger window, lots more bandwidth."

                That's probably something I'l investigate *after* the beta is out since the point of the X11 feature isn't to play back video. It's merely an acid test.

                Originally posted by mikeserv View Post
                This also presents a couple of hurdles that I can imagine:

                1) Unless the region capture is performed at a frequency equal to that of the video framerate then it seems video tearing should be expected to occur at least to some extent pretty much consistently for frames not nearly identical to those preceeding them.
                Most videos are 24fps but I noticed that most video players will push updates to the screen much faster than that. This is why I capped screenshots to 30fps (per window). I too thought this would result in video tearing but it doesn't. In fact, it works far better than I thought it would. You can see from the video I posted on Youtube that the playback (which was about 720p resolution) is nice and smooth. It's even smooth while you move the window while it's playing back.

                Originally posted by mikeserv View Post
                2) To my knowledge (which, again, is prone to err) nearly any video encoded in a codec seriously updated in the past decade or so is likely to have already incorporated similar dirty-region compression methods. Recompressing in this way must be mostly redundant, yes? It seems it could also exaggerate imperfections in the original compression (e.g.: pixellation artifacts presented onscreen when the decompressed video is played on the server display would register damage events, be captured, converted to JPEG/PNG, transmitted to and then processed by clients' browsers by your protocol, each of which steps could potentially introduce further degradation of quality, I suppose). Have you observed anything like this or am I way off base here?
                You are correct that it is redundant compression but I haven't noticed any severe quality problems.

                Originally posted by mikeserv View Post
                Do you by any chance distinguish between window elements when capturing/drawing the screens? By this I mean, is it possible to assign each window element its own canvas element, or do you do so already? Wikipedia indicates that the XCB connection you mentioned should provide the capability of distinguishing between the disparate windows on the server-side, allowing for simple compression tricks like only accepting damage event updates from X's reportedly active window for instance. If assigned their own canvasses, it could be possible to implement things things like client-side-only window controls which might, for example, take advantage of browser-specific features such as Chrome's canvas to panel/native window flags, enabling a sort of seamless desktop integration.
                That's how it currently works in rootless mode: Every window gets its own canvas element. In rooted mode (full desktop) you just get a single canvas element but the way the X11 DAMAGE extension works means very little difference between rooted and rootless mode in terms of bandwidth or CPU utilization with normal application interaction. The big savings in rootless mode comes when you do stuff that only happens on the client side like moving a window. The X11 server is completely unaware of window positions in rootless mode.

                As far as making applications "seamless": That would require a browser extension. I'll definitely be investigating that once it's working great without an extension.

                Originally posted by mikeserv View Post
                This is kind of what I was driving at above, and I'm especially curious how what you're doing is more capable of streaming video than what most other protocols can reliably do without incorporating special forwarding extensions, despite your description of the process sounding more or less in line with standard practice. Could it instead be possible to recognize multimedia elements and pass them through directly via WebM?
                I have already incorporated support of WebP which is a subset of WebM but I don't believe there's any mechanism available for Gate One to detect that a window or region is displaying the content of a particular file. Maybe via dbus queries? It is definitely worth investigating.

                Even if today's video players don't incorporate a mechanism to determine (from XCB or another mechanism) precisely what they're playing back it doesn't mean I can't ask them to add something like that. Heck, if it means a significant performance boost I'll submit a patch!

                Originally posted by mikeserv View Post
                I apologize if this is all way off the mark, I kinda suck at programming myself so I don't always (read: pretty much never) know what I'm talking about, but I do like to read and do I can sometimes acheive a limited conceptual understanding of the inner-workings of things. I'm mostly just interested in reading some more, is all.
                You think I'm a good programmer? LOL! I only learned how to program 4.5 years ago. My day job doesn't even involve programming (not *yet* anyway--here's to hoping Liftoff Software "takes off", hah).

                Originally posted by mikeserv View Post
                Also, I immediately thought of virtual applications when when I first read this article. Could it be possible to do compositing/rendering client-side with multiple servers running in headless vms? Sort of a reverse to the intended paradigm, maybe, but how cool would it be to switch browser tabs and thereby switch OS's?
                This is something I've been working on for Gate One as a whole in a private branch. The problem is the JavaScript: There's this one "GateOne" global object and everything is an attribute thereof. I didn't architect it to work with multiple servers in mind so that will need a great big overhaul and it's in the works for 2.0.

                Ultimately though, what you're suggesting is indeed very possible. It's just a matter of keeping track of which window belongs where and supporting multiple simultaneous WebSocket connections. I've already got some preliminary zeromq code to keep Gate One servers in sync for when that day comes.

                For reference, there's nothing stopping you from having multiple tabs open to multiple virtualized applications right now. Gate One has a mechanism called, "locations". Every terminal/window/whatever belongs to a specific 'location'. So if you wanted to open, say, LibreOffice Calc in a separate tab you'd just add '?location=whateveryouwant' to the end of the Gate One URL. Example:

                https://gateone.company.com/?location=justcalc

                There's currently no GUI controls for switching locations or moving terminals/windows back and forth between them but the API is already there and it's awesome. From the JavaScript console you can move terminals/windows on-the-fly and even switch your active location on-the-fly. You just call 'GateOne.Net.setLocation("somelocation");' and all your workspaces/terminals/windows/X11 desktops/etc will instantly be swapped out with the ones at that other location.

                It's pretty cool (download Gate One and try it!) how well it works. I really need to add a keyboard shortcut that pops-up a little window where you can select and switch locations. It's a really nice way to manage tabs/windows and just plain keep everything organized. It also lets you have as many tabs open as you want, each with their own terminals/apps/desktops/etc.

                BTW: The default location is called 'default' if you want to switch back. Also, to send all your apps/terminals from one location to another just call 'GateOne.Visual.relocateWorkspace(workspaceNumber, newLocation)'.

                Originally posted by mikeserv View Post
                And last, haven't you done enough? I mean, don't take this the wrong way, but GateOne is already so feature-rich, incorporating so much of what you call "polish," and has become so in such a short-time with no apparent sacrifice in documentation or configurability (to the contrary, I found both a little overwhelming a few months ago when i first tried to set it up)... Don't you do this by yourself, man? How do you find the time to do it all and keep the bugs down? How do you keep it all in your head?

                Anyway, best of luck with it all, seriously. Can't wait to try it.

                -Mike
                Not only do I do it myself, I do it as a second job in my spare time. How do I keep the bugs down? I don't know. I do know that I make new ones all the time! hehe.

                Also, I don't keep it "all in my head." Are you kidding me? People report issues all the time with things I haven't worked on in months. When that happens all I can do is refer to my own documentation and extensive code comments that I leave for myself and others for precisely that kind of situation (so I don't forget important facets of the code). There's a reason why every function, method, etc in Gate One has a documentation string: It's so I won't forget, LOL!

                Lastly, while Gate One is currently very feature-rich I'm still only getting started! I have a TODO list a mile long. Not only that but the more I work on it the longer that TODO list gets. This is because I'm often working on some code and I think to myself, "Oooh, I just had a great idea!" Then I either write a little TODO comment right there in the code (they're everywhere) or I open up Evernote and add yet another note to my "Ideas" notebook. Over 1000 so far! Not all are related to Gate One but many of them came to me while I was coding it.

                Comment


                • #53
                  Originally posted by droidhacker View Post
                  What kind of performance can we expect on something with a weaker CPU, like for example, an MSM8660?
                  I'd imagine that it will run great on that CPU! I mean, I am currently working with a customer that is running Gate One on a tiny little TP-LINK TL-WR703N router:

                  http://wiki.openwrt.org/toh/tp-link/tl-wr703n

                  That thing has a 400MHz Atheros AR7240 CPU, only 32MB of RAM, and Gate One is actually very usable. That device doesn't even have an FPU!

                  Not only that but I (just now) had four terminals open on that device doing various things (top, tail on logread, etc) and they were all very responsive. Did I mention that this tiny little router is located in Australia (I'm in the US)? Haha. The average ping time for me hovers around 400ms.

                  Of course, while I was doing all that Gate One was using up about 80% of the device's RAM but what else does it need that RAM for? Certainly not routing tables and firewall rules! Haha.

                  Interesting fact: After the last session expires Gate One will restart itself into a low memory/resource state that turns off nearly all it's periodic callbacks (background stuff to check various files for changes, mostly). Zero CPU utilization as well. For example, on that TP-LINK device the gateone.py process's memory utilization drops from about 80% to 50% when the last session times out (which I've configured to occur five minutes after the last user disconnects).

                  Another thing to think about: I haven't tested this in a while but when you run Gate One under pypy (a JIT-compiling, self-optimizing Python interpreter) it runs even faster. Twice as fast last time I tried it.

                  Comment

                  Working...
                  X