Converting between screen coordinates and pixels

The documentation says that screen coordinates are not necessarily measured in pixels. It also says that functions that return mouse cursor position operate in screen coordinates. So if I understand correctly, if I want to translate cursor positions into world coordinates of whatever I’m rendering in OpenGL, I should first convert the cursor position from screen coordinates to pixels. However, every example that I came across seems to feed cursor positions directly to OpenGL transforms as if they were already in pixels.

Am I right that a conversion is needed? If so, what is the correct way to do it? Do I just compute it manually based on the ratio between window size and framebuffer size?

If you look at the projection matrix calculations you will see that they do not depend on the resolution of the framebuffer, but on the aspect ratio defined by the width / height. This is because the output of the vertex shader is in clip space which is an axis aligned cube with coordinates ranging from (-1.0,-1.0,-1.0) to (+1.0,+1.0,+1.0). The clip space coordinates are transformed to screen space coordinates using the viewport transform you set with glViewport.

So you don’t need to worry about transforming your window coordinates to viewport coordinates if you use the inverse projection transform, as your input to these is in clip space. You will likely need a transform like:

// Following code untested
double x, y;
glfwGetCursorPos(  g_pSys->window, &x, &y );

// cursor pos is in window coordinates, not gl fragment coords so use window size not framebuffer size
int w, h;
glfwGetWindowSize( g_pSys->window, &w, &h );

vec4 clip_space;
clip_space.x = 2.0f * (float)x/(float)w- 1.0f;
clip_space.y = 1.0f - 2.0f * (float)y/(float)h;
clip_space.z = -1.0f;  // depends on whether you want the near or far plane
clip_space.w = 1.0f;
vec4 view_space = mInverseProjFromView * clip_space;
1 Like

Some projections depend only on the aspect ratio (like the perspective projection), but some don’t (like the orthographic projection), so unless I misunderstood, your answer doesn’t completely address my question.

As a side question: is the aspect ratio of the framebuffer guaranteed to be the same as the aspect ratio of the window? Or could it happen that there are different scale factors in x and y coordinates?

but some don’t (like the orthographic projection)

The orthographic projection also does not depend on the resolution. The input coordinates define a box in view space (view space is the coordinates after the camera view transform, similar to world space coordinates with an non-rotated, camera at the origin) which is converted to clip space by the projection and then after the vertex shader to viewport space by the current glViewport setting.

This is often confused in tutorials which use the window coordinates as inputs to the orthographic matrix calculation. It’s fairly easy to figure out why this should be view space - as they define a box around what you want to display.

To make the result of an orthographic projection have the same scale in your rendered view, it’s usually best to ensure that the aspect ration is the same as the aspect ration of your window.

is the aspect ratio of the framebuffer guaranteed to be the same as the aspect ratio of the window?

No (though I can’t recall seeing any real world situation with it being different, and many window APIs have only one scale value for both x and y). However this doesn’t change the way you convert cursor coordinates to clip space and then to view space and finally world space.

1 Like

I don’t understand what you mean. Here’s the matrix of orthogonal projection, it clearly depends on width and height, not just on the aspect ratio. If I set my projection to orthogonal projection with twice the framebuffer size what I see on my screen is half of the original size as it should be.

The coordinates of glOrtho are all view space coordinates.

So if you have three points, p0, p1, p2 which define the vertices of a triangle, and you want to view this triangle with glOrtho then you do something like the following:

// untested code

float left    = min(p0.x, min(p1.x, p2.x) );
float right   = max(p0.x, max(p1.x, p2.x) );
float bottom  = min(p0.y, min(p1.y, p2.y) );
float top     = max(p0.y, max(p1.y, p2.y) );
float nearVal = min(p0.z, min(p1.z, p2.z) );
float farVal  = max(p0.z, max(p1.z, p2.z) );

glOrtho( left, right, bottom, top, nearVal, farVal );

If you want to ensure the aspect ratio is the same as your window, then you modify the left,right or bottom,top coordinates to maintain your desired aspect ration.

1 Like

Sure, but the arguments I need to pass to glOrtho to see what I want to see depend on the framebuffer size which is exactly what you say next:

I agree, but I need to take the actual framebuffer size into account (not just the aspect ratio) since it’s going to affect how much of my screen the triangle will occupy.

1 Like

I stated that they depend on the aspect ratio, not the actual size.

The quantity how much of my screen is a ratio, and since it’s a proportion of your window size you can use window coordinates.

All of this is however mute, the question you’ve asked is if you need to transform mouse coordinates to pixels before translating to world coordinates, and the answer is you do not - the code from my first reply is what you need for all cases.

1 Like