Announcement

**fthiery** · 08 November 2013, 05:12 AM

Originally posted by sarmad View Post

Am I the only one who feels HTML has become an operating system? A slow and memory intensive operating system that is?

**fthiery** · 08 November 2013, 05:16 AM

Originally posted by riskable View Post

Hey there... I'm the author of Gate One. If you guys have any specific questions about it or the new X11 support just ask away.

Hi there ; how does your X11 implementation work then ? Are you sending compressed video or real X11 transport over ssh, then display with a js based X11 implementation ?

**ssokolow** · 08 November 2013, 06:39 AM

Originally posted by krach View Post

Thanks for clarifying! So indeed:
1.) the ssh password is traveling unencrypted from my computer to the gate one server.
2.) the gate one server needs to process my password.
Maybe you should put a warning somewhere. I guess the only proper setup would be a trusted server (my own) to which I connect over a trusted network (my lan, certainly not the internet). This limits the applicability quite a bit

It does look very impressive though! But I don't really see the use case for me yet.

Given that Safari was listed as working "if you don't use a self-signed certificate", I'm assuming that a normal (non-demo) install uses SSL/TLS to encrypt step #1.

That'd mean you don't need a trusted network. You only need to be sure that:

The machine you're sitting at has no keylogger (unavoidable requirement unless you're booting something like Tinfoil Hat Linux off a LiveCD and entering your password by selecting letters from a randomly-organized grid)
The machine running Gate One is trusted (normally, this would be the same kind of compromise as keylogging the machine you're sitting at since they could just grab things like your SSH private key while installing the keylogger.)

**mikeserv** · 08 November 2013, 07:30 AM

Originally posted by krach View Post

Thanks for clarifying! So indeed:
1.) the ssh password is traveling unencrypted from my computer to the gate one server.
2.) the gate one server needs to process my password.

Maybe you should put a warning somewhere.

This is an entirely preposterous assumption. It might be acceptable if the statements you make were formed as questions, but even that would just mark you as too lazy to open a new browser tab.

This is a public forum linked to from an article discussing new alpha features soon-to-be-implemented in an otherwise arguably stable terminal server application. What makes your statements so ridiculous is you immediately question this application's core security functions - functions pretty much fundamental to any similar application - apparently without even bothering to check the documentation available online or the github source which, in addition to this forum thread, are both linked to in the article!

To spare the application's author the time required to answer your inane questions (with hopes he will sooner provide X11 support), by default the software installs a Tornado web-server configured for access via encrypted HTTPS Websocket. MANY session authentication mechanisms exist, including nic and subnet bind filters, which are configurable locally in json dicts on the server. In my experience the simplest of those offered was simply to authenticate with my Google account - which should be as secure as my Google Wallet account.

As for warnings, the author includes those in the online docs, installed manages, command-line --help options, and previously mentioned json config files noting in all of them that disabling HTTPS is (something to the effect of) "generally a bad idea."

And seriously, who would code an SSH remote access client which transmits in plain passwords in plain text?

-Mike

**riskable** · 08 November 2013, 10:38 AM

Originally posted by krach View Post

Thanks for clarifying! So indeed:
1.) the ssh password is traveling unencrypted from my computer to the gate one server.
2.) the gate one server needs to process my password.
Maybe you should put a warning somewhere. I guess the only proper setup would be a trusted server (my own) to which I connect over a trusted network (my lan, certainly not the internet). This limits the applicability quite a bit

It does look very impressive though! But I don't really see the use case for me yet.

Well, you're wrong about #1: Gate One uses HTTPS (SSL/TLS) encryption. When you run it the first time it will generate a 4096-bit, self-signed RSA certificate/key but you can use whatever certificate you want. All your keystrokes and terminal output are encrypted.

You're also wrong about #2 to an extent: Gate One doesn't "process" your keystrokes at all, really. It just forwards them on to the underlying terminal program. You can audit that code for yourself:

GateOne/gateone/applications/terminal/app_terminal.py at master · liftoff/GateOne

https://github.com/liftoff/GateOne/blob/master/gateone/applications/terminal/app_terminal.py#L1825

Gate One is an HTML5-powered terminal emulator and SSH client - liftoff/GateOne

The "key" line (haha, I kill me) is:

Code:

multiplex.write(chars)

So that function takes 'chars' as an argument (directly from the WebSocket) and it writes it directly to the running terminal program. There's no "processing" of your keystrokes.

Having said that, if you set the log level to "debug" it will log all your keystrokes but when you turn on debug logging it displays warnings all over the place. Users get a HUGE pop-up with a big message warning them that debug logging is enabled and that means their keystrokes will be recorded.

You still have to trust whoever is running the server but that's no different from anything else really. Do you trust that sshd hasn't been modified to record your keystrokes? It's a trivial thing for any admin to do.

**riskable** · 08 November 2013, 10:44 AM

Originally posted by krach View Post

Oh, also the demo server is not reachable via https, only http.

That is correct. I'm not using HTTPS with the demo for three reasons:

1) There's nothing sensitive/private. It doesn't let you SSH and it doesn't take passwords. It's just a tech demo.
2) I don't feel that buying an SSL certificate to demonstrate the product is necessary (people can freak out when they are presented with the, "Are you sure you wish to proceed?" interstitial page).
3) There's a lot of (big) businesses that use proxies that don't work with SSL WebSockets (initially). BlueCoat is to blame for most of this with their "speed bump" mechanism that doesn't know the difference between a WebSocket request and a regular HTTP GET. Worst of all, when the proxy interferes like that the browser won't report a proper error (that I can deal with in the code). It just sort of "never finishes connecting... Forever."

**riskable** · 08 November 2013, 11:02 AM

Originally posted by fthiery View Post

Hi there ; how does your X11 implementation work then ? Are you sending compressed video or real X11 transport over ssh, then display with a js based X11 implementation ?

How it works at a low-level is quite complicated due to a lot of asynchronous/multiprocess code. Having said that, here's a high-level overview:

Gate One connects to the X11 server using XCB and gives it a list of events it wishes to be notified of. These events are the usual stuff you'd expect like, "Window XYZ was just resized and here are the new dimensions" or, "Window ABC just closed." It also asks the X11 server to report "damage" events which are reports of, "what just changed" inside any given window or region of the display.

Damage events include coordinates and dimensions in the form of: X, Y, width, height. These are used by Gate One to grab a screenshot of that exact region of the window or the display (depending on how you've got it configured). Once it's got the raw image data it converts it to something the browser will understand (encoding) such as a PNG or JPEG image. Once the image data is converted it is sent to the client over the WebSocket. The client (browser) then draws the image on a canvas element and the user's view of that window is updated.

Things like mouse clicks and keystrokes work in a similar fashion to how they do with Gate One terminals... Keydown->Gate One server->program.

There's no "video encoding" going on. It's just screenshots of windows. It just happens (thanks to a lot of hard work) that Gate One can capture, encode, and send screenshots fast enough that playing back a video is possible.

Hopefully that answers your question?

**krach** · 08 November 2013, 02:12 PM

@ ssokolow, mikeserv, riskable

Ok. Thanks for enlightening me. I was indeed too quick to jump to conclusions. Sorry for that!

**mikeserv** · 08 November 2013, 06:17 PM

Originally posted by riskable View Post

...Gate One connects to the X11 server using XCB and gives it a list of events it wishes to be notified of. These events are the usual stuff you'd expect like, "Window XYZ was just resized and here are the new dimensions" or, "Window ABC just closed." It also asks the X11 server to report "damage" events which are reports of, "what just changed" inside any given window or region of the display.

Damage events include coordinates and dimensions in the form of: X, Y, width, height. These are used by Gate One to grab a screenshot of that exact region of the window or the display (depending on how you've got it configured)...

Ok, so dirty region updates. As I understand it (in an admittedly limited capacity), this is generally very efficient (and already a standard practice for most remote display protocols) at typical gui window updates and the like but can severely degrade full-fps video transmission, especially as the video display window-size increases and/or fast-moving scenes in the video increase the area per frame that needs updating.

This also presents a couple of hurdles that I can imagine:

1) Unless the region capture is performed at a frequency equal to that of the video framerate then it seems video tearing should be expected to occur at least to some extent pretty much consistently for frames not nearly identical to those preceeding them.

2) To my knowledge (which, again, is prone to err) nearly any video encoded in a codec seriously updated in the past decade or so is likely to have already incorporated similar dirty-region compression methods. Recompressing in this way must be mostly redundant, yes? It seems it could also exaggerate imperfections in the original compression (e.g.: pixellation artifacts presented onscreen when the decompressed video is played on the server display would register damage events, be captured, converted to JPEG/PNG, transmitted to and then processed by clients' browsers by your protocol, each of which steps could potentially introduce further degradation of quality, I suppose). Have you observed anything like this or am I way off base here?

Originally posted by riskable View Post

Once it's got the raw image data it converts it to something the browser will understand (encoding) such as a PNG or JPEG image. Once the image data is converted it is sent to the client over the WebSocket. The client (browser) then draws the image on a canvas element and the user's view of that window is updated.

Do you by any chance distinguish between window elements when capturing/drawing the screens? By this I mean, is it possible to assign each window element its own canvas element, or do you do so already? Wikipedia indicates that the XCB connection you mentioned should provide the capability of distinguishing between the disparate windows on the server-side, allowing for simple compression tricks like only accepting damage event updates from X's reportedly active window for instance. If assigned their own canvasses, it could be possible to implement things things like client-side-only window controls which might, for example, take advantage of browser-specific features such as Chrome's canvas to panel/native window flags, enabling a sort of seamless desktop integration.

Originally posted by riskable View Post

Things like mouse clicks and keystrokes work in a similar fashion to how they do with Gate One terminals... Keydown->Gate One server->program.

There's no "video encoding" going on. It's just screenshots of windows. It just happens (thanks to a lot of hard work) that Gate One can capture, encode, and send screenshots fast enough that playing back a video is possible...

This is kind of what I was driving at above, and I'm especially curious how what you're doing is more capable of streaming video than what most other protocols can reliably do without incorporating special forwarding extensions, despite your description of the process sounding more or less in line with standard practice. Could it instead be possible to recognize multimedia elements and pass them through directly via WebM?

I apologize if this is all way off the mark, I kinda suck at programming myself so I don't always (read: pretty much never) know what I'm talking about, but I do like to read and do I can sometimes acheive a limited conceptual understanding of the inner-workings of things. I'm mostly just interested in reading some more, is all.

Also, I immediately thought of virtual applications when when I first read this article. Could it be possible to do compositing/rendering client-side with multiple servers running in headless vms? Sort of a reverse to the intended paradigm, maybe, but how cool would it be to switch browser tabs and thereby switch OS's?

And last, haven't you done enough? I mean, don't take this the wrong way, but GateOne is already so feature-rich, incorporating so much of what you call "polish," and has become so in such a short-time with no apparent sacrifice in documentation or configurability (to the contrary, I found both a little overwhelming a few months ago when i first tried to set it up)... Don't you do this by yourself, man? How do you find the time to do it all and keep the bugs down? How do you keep it all in your head?

Anyway, best of luck with it all, seriously. Can't wait to try it.

-Mike

**mikeserv** · 08 November 2013, 10:42 PM

Originally posted by riskable View Post

...Gate One connects to the X11 server using XCB and gives it a list of events it wishes to be notified of. These events are the usual stuff you'd expect like, "Window XYZ was just resized and here are the new dimensions" or, "Window ABC just closed." It also asks the X11 server to report "damage" events which are reports of, "what just changed" inside any given window or region of the display.

Damage events include coordinates and dimensions in the form of: X, Y, width, height. These are used by Gate One to grab a screenshot of that exact region of the window or the display (depending on how you've got it configured)...

Ok, so dirty region updates. As I understand it (in an admittedly limited capacity), this is generally very efficient (and already a standard practice for most remote display protocols) at typical gui window updates and the like but can severely degrade full-fps video transmission, especially as the video display window-size increases and/or fast-moving scenes in the video increase the area per frame that needs updating.

This also presents a couple of hurdles that I can imagine:

1) Unless the region capture is performed at a frequency equal to that of the video framerate then it seems video tearing should be expected to occur at least to some extent pretty much consistently for frames not nearly identical to those preceeding them.

2) To my knowledge (which, again, is prone to err) nearly any video encoded in a codec seriously updated in the past decade or so is likely to have already incorporated similar dirty-region compression methods. Recompressing in this way must be mostly redundant, yes? It seems it could also exaggerate imperfections in the original compression (e.g.: pixellation artifacts presented onscreen when the decompressed video is played on the server display would register damage events, be captured, converted to JPEG/PNG, transmitted to and then processed by clients' browsers by your protocol, each of which steps could potentially introduce further degradation of quality, I suppose).

Have you observed anything like this or am I way off base here?

Originally posted by riskable View Post

Once it's got the raw image data it converts it to something the browser will understand (encoding) such as a PNG or JPEG image. Once the image data is converted it is sent to the client over the WebSocket. The client (browser) then draws the image on a canvas element and the user's view of that window is updated.

Do you by any chance distinguish between window elements when capturing/drawing the screens? By this I mean, is it possible to assign each window element its own canvas element, or do you do so already? Wikipedia indicates that the XCB connection you mentioned should provide the capability of distinguishing between the disparate windows on the server-side, allowing for simple compression tricks like only accepting damage event updates from X's reportedly active window for instance. If assigned their own canvasses, it could be possible to implement things things like client-side-only window controls which might, for example, take advantage of browser-specific features such as Chrome's canvas to panel/native window flags, enabling a sort of seamless desktop integration.

Originally posted by riskable View Post

Things like mouse clicks and keystrokes work in a similar fashion to how they do with Gate One terminals... Keydown->Gate One server->program.

There's no "video encoding" going on. It's just screenshots of windows. It just happens (thanks to a lot of hard work) that Gate One can capture, encode, and send screenshots fast enough that playing back a video is possible...

This is kind of what I was driving at above, and I'm especially curious how what you're doing is more capable of streaming video than what most other protocols can reliably do without incorporating special forwarding extensions, despite your description of the process sounding more or less in line with standard practice. Could it instead be possible to recognize multimedia elements and pass them through directly via WebRTC/WebM?

I apologize if this is all way off the mark, I kinda suck at programming myself so I don't always (read: pretty much never) know what I'm talking about, but I do like to read and do I can sometimes acheive a limited conceptual understanding of the inner-workings of things. I'm mostly just interested in reading some more, is all.

Also, I immediately thought of virtual applications when when I first read this article. Could it be possible to do compositing/rendering client-side with multiple servers running in headless vms? Sort of a reverse to the intended paradigm, maybe, but how cool would it be to switch browser tabs and thereby switch OS's?

And last, haven't you done enough? I mean, don't take this the wrong way, but GateOne is already so feature-rich, incorporating so much of what you call "polish," and has become so in such a short-time with no apparent sacrifice in documentation or configurability (to the contrary, I found both a little overwhelming a few months ago when i first tried to set it up)... Don't you do this by yourself, man? How do you find the time to do it all and keep the bugs down? How do you keep it all in your head?

Anyway, best of luck with it all, seriously. Can't wait to try it.

-Mike

Announcement

Open-Source HTML5 Terminal Emulator To Support X11

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment