11:10 karolherbst: pmoreau: I think the biggest issue we have right now is that the arch specific things are spread all over the place. But afaik we don't have half regs on nvc0+ and I don't think we ever emit such
12:42 pmoreau: karolherbst: When preparing the second series, I will be reworking my existing patches to take place during the legalisation passes rather than changing the NIR frontend to adapt to different targets, especially regarding those half registers.
12:42 karolherbst: yeah.. not sure
12:42 karolherbst: I'd prefer to move everything into nir tbh :D
12:43 karolherbst: all of those things are easier in nir
12:44 karolherbst: this entire half register concept is broken anyway...
12:46 pmoreau: I’d also prefer moving everything into NIR, but that’s a much larger change than what I was thinking for that second series. :-D
12:46 karolherbst: :D
12:46 karolherbst: well.. not everything, but yeah..
12:46 karolherbst: there are still regressions and everything
12:47 karolherbst: getting rid of the TGSI stuff would be the first step anyway
12:48 pmoreau: Well, we do not want to get rid of TGSI before we have NIR fully working, otherwise people might not be super happy about it.
12:49 karolherbst: yeah
12:49 karolherbst: but we already use it for turing :D
12:49 karolherbst: I am fairly confident it works, but I think it might make sense to deal with regressions and move one gen at a time over
12:49 karolherbst: until we are done or so
12:50 pmoreau: Oh, do we? I think I had missed that.
12:50 karolherbst: yeah.. volta as well
12:50 karolherbst: Ben and I were to lazy to write those assembly functions
12:50 karolherbst: *too
12:50 karolherbst: so we just use Nir
12:50 pmoreau: I guess mostly looking at Tesla for Nouveau, I have missed what happened for those new fancy architectures! :-D
12:52 pmoreau: Once I get basic mostly working (not sure if I want to get images working first or leave them for later), I would not be opposed to moving the compute stuff to be a lot more NIR based.
12:53 karolherbst: yeah..
12:54 karolherbst: but working on nir regressions with nv50 would be helpful
13:41 pmoreau: Indeed
13:43 pmoreau: Is there a way we could do the register allocation with Nir? That could be something I would look into relatively soon, as some of the remaining issues for basics are shortcomings in the current register allocation code.
14:07 karolherbst: pmoreau: no
14:07 karolherbst: but
14:07 karolherbst: we have a lib for it
14:07 karolherbst: src/util/register_allocate.h
14:07 karolherbst: it's a completely different approach, but it might allow us to get rid of our own flawed RA
14:08 karolherbst: there is just one issue
14:08 karolherbst: we do more than just RA in RA
14:08 karolherbst: we have reg zer0 handling there
14:08 karolherbst: tex stuff
14:08 karolherbst: and potentially more things
14:08 karolherbst: we also have some hacks for better reg placement to deal with limm mad forms and stuff
14:17 glennk: how is nir's support for mixed precision fp32/fp16?
14:19 glennk: in particular is there anything for propagating required precision from fbos and textures through shaders?
14:20 karolherbst: glennk: it's all explicit
14:20 glennk: ok, so mostly pointless then
14:20 karolherbst: what do you mean?
14:21 glennk: a pretty fair first order approximation of number of shaders using precision modifiers out in the wild: 0 :-)
14:22 karolherbst: ahh
14:22 karolherbst: well it's more common with gles
14:22 glennk: even there its a bit sketchy
14:23 karolherbst: yeah well
14:24 karolherbst: we have to do what the shader is telling us :D
14:24 glennk: and things like gnome-shell just set precision highp float
14:24 karolherbst: there are precision annotations, but yeah...
14:25 karolherbst: anyway, we get all the info the shader contains
14:28 glennk: well, the idea is you also need to look at the types of the bound fbos and textures, and compile variants for 8 bit vs 32 bit
14:30 karolherbst: mhhh
14:30 karolherbst: do we have to support that with nv30?
14:30 karolherbst: but the output in nir is typed
14:30 karolherbst: I just think it's probably always a vec4?
14:30 karolherbst: not sure
14:30 karolherbst: maybe others know
14:30 karolherbst: or have similiar problems
14:32 glennk: for fragment shaders on nv3x fp16 is ~twice the rate of fp32
14:32 karolherbst: anyway... nir in nv30 was just a fun project idea, because atm we don't do any optimizations :D one other path would be glsl -> nir -> tgsi
14:33 karolherbst: glennk: mhh, but we have to do it from within shaders?
14:33 glennk: nv4x its a little less pronounced but still significant, also nv4x has some odder fixed point formats which may or may not be faster
14:33 karolherbst: okay, but with TGSI we also output fp32, no?
14:33 karolherbst: what we could do is shader variants and just compile fp16 versions ondemand
14:34 glennk: the current backend always outputs fp32, except for some odd ops like branches which cargo cult fp16
14:34 karolherbst: "decl_var shader_out INTERP_MODE_SMOOTH vec4 out_1 (VARYING_SLOT_COL0.xyzw, 1, 0)" is what we get normally
14:34 karolherbst: but I suspect it could also be a fp16 vec4
14:34 karolherbst: ehhh wauit
14:34 karolherbst: that's vp
14:34 karolherbst: "decl_var shader_out INTERP_MODE_NONE vec4 gl_FragColor (FRAG_RESULT_COLOR.xyzw, 0, 0)"
14:35 glennk: how do you think one determines if fp16 or fp32 is required for correctly computed output?
14:35 karolherbst: st/mesa does?
14:37 glennk: i think you are mixing up explicit precision attributes, which i just stated are basically of no practical use, with determining what accuracy the computations require to get correct output
14:37 karolherbst: I understood what you mean
14:37 karolherbst: soo.. we get the glsl ir and we convert it into nir within st/mesa
14:38 karolherbst: but st/mesa could also force the output to be a fp16
14:38 karolherbst: or we do
14:38 karolherbst: we get the nir handed in and can change whatever we want
14:38 glennk: i don't think you quite see the point i am trying to make
14:41 karolherbst: I think I do. So you mean if the fbo is of a fp16 format, we could also return fp16 values from withing the fp or not?
14:41 karolherbst: I honestly don't see the issue, as we could just do that
14:41 glennk: i seem to have lost you somewhere along the way
14:41 karolherbst: ohh you also mean for inputed data?
14:43 glennk: so for an example if you have something with no precision qualifiers ie effectively vec4 etc in shaders are floats, then to compute an exactly equivalent result using at least one operation reduced to fp16
14:44 glennk: in order to determine that you have to look at the range and precision of the actual inputs and outputs the shader gets at runtime
14:44 karolherbst: right
14:44 karolherbst: so you mena basic fp32->fp16 opts
14:44 karolherbst: *mean
14:45 karolherbst: so if we know, the inputs are actually fp16, there is no point doing fp32, because it's slower
14:45 karolherbst: in case we can make sure, that the fp16 ops would result into the same end result
14:46 glennk: that would be a wildly incorrect assumption
14:46 glennk: you have to propagate the errors through the shader for each op
14:46 karolherbst: sure
14:47 glennk: its like value range propagation, but including precision
14:47 glennk: and you pre-fill the input and output types and ranges
14:48 karolherbst: yeah...
14:48 glennk: i think in practice just flipping between all inputs/outputs fp16 or fp32 would be enough to be useful most of the time
14:48 karolherbst: probably
14:49 karolherbst: so changing intputs/outputs should be quite easy with shader variants
14:49 karolherbst: we just need to look at the full pipeline
14:49 karolherbst: which is just vp/fp for nv30 anyway
14:50 karolherbst: but I think there are attempts at value range optimizations in nir
14:50 glennk: well, can ignore vp on nv30 since its all fp32
14:50 karolherbst: ahh
14:50 karolherbst: like fp32 fp32 or just 32 bit?
14:50 karolherbst: could load 32 bit and split the value
14:51 glennk: see nvfx_shader.h, its not a complicated ISA
14:52 karolherbst: yeah, but I didn't mean it from an ISA perspective
14:53 glennk: are you talking about the interpolants between vs->fs?
14:53 karolherbst: yes, and generic inputs to vs
14:54 karolherbst: like can we pass 2xfp16 values in and between the stages
14:54 karolherbst: within a 32bit varying
14:54 karolherbst: or input/outpuits
14:54 glennk: i don't think so?
14:54 karolherbst: why not?
14:54 karolherbst: does the hw manipulate the data?
14:55 glennk: i think the hardware always interpolates fp32
14:55 karolherbst: interpolated inputs and stuff.. if those have to be fp32, okay, sure
14:55 karolherbst: but I also meant generic
14:55 karolherbst: there are generic input/outputs and fp inputs not being interpolated, no?
14:55 karolherbst: at least on later gens we have that
14:55 glennk: those are interpolated too, unless flat shading
14:55 karolherbst: right
14:55 glennk: or do you mean uniforms/constants?
14:56 karolherbst: well uniforms are just passed in
14:56 karolherbst: so those should be fine
14:56 karolherbst: passing values between stages is more interesting
14:57 karolherbst: okay, but we could delcare a "flat" fp32 output/input in vs/fs and split it to two fp16 values?
14:57 karolherbst: the annoying bit is just that those things can usually be overwritten and shit :/
14:58 glennk: its pretty rare for flats to be used
14:58 karolherbst: true
14:58 glennk: typically everything is interpolated
14:58 karolherbst: but not all data passed in gets interpolated
14:58 karolherbst: there are generic values
14:59 glennk: there's nothing special about those
14:59 glennk: they get interpolated too according to flat/smooth shade state
14:59 glennk: and perspective/linear
15:00 karolherbst: but not all
15:00 karolherbst: we do have 32 generic varyings
15:00 karolherbst: and they do get used
15:01 karolherbst: I am just wondering if that stuff is supported on nv30
15:01 karolherbst: mhh
15:01 karolherbst: we have "VARYING_SLOT_VAR0_16BIT" stuff
15:01 karolherbst: "32 16-bit vec4 slots packed in 16 32-bit vec4 slots for GLES/mediump."
15:02 glennk: you are talking about nvc0 now right?
15:02 karolherbst: nope
15:02 karolherbst: core stuff
15:02 karolherbst: looking at struct gl_varying_slot
15:02 karolherbst: ehh
15:02 karolherbst: enum
15:03 karolherbst: so if an application does use a generic varying to push data from a vp to a fp, nv30 does or does not have hw support for it? and if we pass 2xfp16 inside a fp32 slot, do we get the raw data or does the hw anything weird to them?
15:04 glennk: i think use of those are gated behind CAP bits
15:04 karolherbst: does't seem to be
15:05 glennk: PIPE_CAP_GLSL_FEATURE_LEVEL if nothing else
15:05 karolherbst: maybe it's just gated on the glsl level
15:05 karolherbst: yeah.. let's see
15:05 glennk: this is GL 2.1 hardware
15:05 karolherbst: generic varying seems to be a gl 2.0 feature
15:06 glennk: yes but fp16 isn't
15:06 karolherbst: so?
15:06 karolherbst: it's raw data
15:06 karolherbst: or not
15:06 karolherbst: so if you pack two fp16 values within a fp32 generic varying, is the hw doing anything funny or just passing the raw data through?
15:07 karolherbst: nir does have support for this packing as it seems
15:07 karolherbst: so hence me wondering
15:07 karolherbst: or well.. st/mesa
15:09 glennk: where would this packing come from?
15:09 glennk: are you manually packing bits into a vec2 in GLSL?
15:09 karolherbst: we have nir_lower_mediump.c
15:09 glennk: again, where is this packing coming from?
15:10 glennk: there are no such functions in the GLSL versions that nv4x can support
15:10 karolherbst: what do you mean with that? you have fp16 values and don't want to do a fp16 to fp32 conversion just to pass it into the fp
15:11 karolherbst: forget about GLSL for a moment
15:11 karolherbst: this is all mesa internal
15:11 karolherbst: so we find opportunities to use fp16 inside a vp
15:11 karolherbst: for whatever reason
15:11 karolherbst: be it a texture being actually fp16 or anything else
15:11 glennk: forget about vertex programs here
15:12 karolherbst: and have a fp16 value we want to pass into the fragment program
15:12 glennk: nv3x/nv4x its all 32 bit floats always
15:12 glennk: i am talking about texture sampling and framebuffer operations
15:12 karolherbst: ahhh
15:12 karolherbst: okay
15:13 glennk: vp/fp passing is a bit of a side track - on other hardware its probably more relevant
15:13 karolherbst: then if you talk about texture sampling and fb ops, how is that relevant to the shader?
15:14 glennk: think about it for more than 2 seconds and you'll figure it out :-)
15:14 glennk: maybe if i paste you an actual shader it'll be clearer?
15:14 karolherbst: probably
15:22 glennk: https://pastebin.com/raw/M7jsEhKk
15:24 glennk: random gnome-shell shader, picked as its a bit of a dog on the old 6600 i'm testing with
15:25 karolherbst: okay.. sure, but what fp16 ops do we actually support in nv30? just texturing or also alu?
15:25 glennk: so on this isa its a bit weird
15:26 glennk: registers are fp32, but you can set a bit on ops to write them as fp16, its still "stored" as fp32 but uses half the register port bandwidth
15:26 karolherbst: okay sure
15:26 glennk: and then there's a separate bit for precision of op inputs and the op itself
15:26 karolherbst: so we more or less have a full fp16 alu, just use fp32 regs
15:26 karolherbst: can we operate on hi/lo bits?
15:27 glennk: no, i'm not even sure if fp16 and fp32 are even aliased or separate register files
15:27 karolherbst: ahh
15:27 glennk: but conversions are "free"
15:27 karolherbst: well..
15:27 karolherbst: shouldn't be complicated in hw
15:28 karolherbst: okay
15:28 karolherbst: glennk: but then I don't see the problem once a texture is marked as a "fp16" thing
15:30 glennk: the problem is there is no propagation algorithm implemented
15:31 karolherbst: oh mhh... well
15:31 glennk: and computing the precision bars for each op is a bit of a research grade issue
15:31 karolherbst: right.. as shaders are still fp32
15:32 karolherbst: so if you multiply two fp16 values you still expect fp32 precision
15:32 glennk: final output of the shader equivalent to as if it was all fp32
15:32 glennk: to be a bit more precise about the wording :-)
15:33 karolherbst: the question is.. are there fp16 ops which don't loose precision?
15:33 karolherbst: compared to doing it in fp32
15:33 karolherbst: but in the end it only matters what glsl expects
15:33 karolherbst: and that's almost nothing
15:33 karolherbst: we could probably just turn the whole thing to fp16 and nobody would notice
15:34 glennk: say you are mixing colors coming from two 8 bit textures
15:34 karolherbst: ehh.. but that's int
15:34 glennk: and you'd be wrong to assume that because then things like texture coordinates come in and need fp32
15:34 karolherbst: int is a totally completely different story
15:34 glennk: and the award for missing the point goes to... :-p
15:35 karolherbst: ohh coordinates
15:35 karolherbst: mhh
15:35 glennk: by 8 bit i mean snorm so 0 - 1.0 range just to clarify
15:35 karolherbst: ahh
15:36 karolherbst: I guess nv30 doesn't support int textures anyway...
15:37 karolherbst: glennk: what we could do is to define "safe" usages and just propagate it this way
15:37 karolherbst: we do have search helpers
15:37 glennk: how do you know what is safe and not?
15:38 karolherbst: it's glsl, not CL
15:38 karolherbst: I don't think that for fp16 outputs it really matters
15:39 karolherbst: if an op requries fp32, because it's... a hard fp32 then yes, it stays fp16
15:39 karolherbst: but why would it matter otherwise?
15:39 karolherbst: comparing to constants would be annoying
15:39 karolherbst: but those are hard fp32 ops then
15:39 karolherbst: you could e.g. back propagate
15:39 karolherbst: start with the output
15:40 glennk: i did say value range propagation with precision added previously?
15:40 karolherbst: yeah, but I don't think we have to go that far
15:41 karolherbst: a slight change in result is okay by glsl
15:41 glennk: define slight change?
15:42 glennk: do you mean like some minor subtexel shifting, or a framebuffer value being off by 1/255?
15:42 karolherbst: if you add an opt and the result is different, but not buggy, it's fine
15:43 karolherbst: heck.. even the same op in vp and fp is allowed to have different values
15:43 karolherbst: unless you specify in the shader they absolutely should have the same
15:44 karolherbst: unless it doesn't look "wrong" it's fine to optimize
15:44 glennk: how do you intend to algorithmically determine "wrongness" ?
15:44 karolherbst: we don't and don't have to
15:45 karolherbst: so what happens is, that some CI systems just ping when a trace reports different results
15:45 karolherbst: and somebody looks at it and either acks it or not
15:45 karolherbst: that's how it's done
15:45 karolherbst: again.. it's not CL
15:45 karolherbst: if the result changes due to opts nobody cares unless it looks wrong
15:46 glennk: are you still thinking in terms of a global shader fp32/fp16 switch?
15:46 karolherbst: I never did
15:46 glennk: then i'm not sure how you make the decision for each op in the shader
15:47 karolherbst: you start with results and go up the sources until you hit something where you don't want to drop from fp32 to fp16
15:47 karolherbst: don't have to do the full value range thing for this
15:48 karolherbst: if you store a fp32 add as fp16 anyway, you can just make it a fp16 op, and so on
15:48 karolherbst: comparing with fp32 constants might be a point you don't want to do it or something
15:48 karolherbst: and stop propagating
15:49 karolherbst: or if the result is used also for fp32 ops
15:55 glennk: last i checked nir output wasn't great for vector ISAs, has that changed recently?
15:56 karolherbst: why shouldn't it be greate for vector ISAs?
15:56 glennk: tgsi and nv3x/4x are pretty close to 1:1
15:56 karolherbst: *great
15:56 karolherbst: all ops can be used vectorized
15:57 glennk: one comparison point is nir on r600g where its pretty sub par compared to tgsi
15:57 karolherbst: mhh, interesting
15:58 karolherbst: I wouldn't see why it's a problem there though. you can jsut have vec4 input/outputs if you want to
15:58 karolherbst: it only gets annoying once you start doing packing
15:58 glennk: well r600g is vliw really
15:58 karolherbst: ehhh
15:58 glennk: nv3x is really vector
15:58 karolherbst: yeah.. vliw...
15:58 glennk: as in only vector, no scalar ops
15:58 karolherbst: sure
15:59 karolherbst: no problem
15:59 glennk: also you get 512 instructions for vertex shaders
15:59 karolherbst: then you have vec4 values
15:59 karolherbst: okay.. instruction limit is kind of an issue
15:59 glennk: fp on nv4x is a bit more generous, 64k i think?
15:59 karolherbst: but the idea of using nir was to get optimizations, so you can potentially execute larger shaders
16:00 glennk: its kinda pointless if it spits out scalar code
16:00 karolherbst: but anholt works on using nir between glsl ir and tgsi
16:00 karolherbst: why would it?
16:00 karolherbst: nir isn't scalar or vectorized, it's both
16:00 karolherbst: and you choose how it should behave
16:00 karolherbst: if you want to keep it vectored, fine
16:00 karolherbst: you just keep it vectored
16:01 karolherbst: there are even passes to merge vectore ops if not all channels are always used and such
16:01 karolherbst: we also have nir_opt_vectorize
16:01 glennk: but as far as i can tell it can still spit out scalar and vec2/3 etc ops that the backend needs to handle?
16:02 karolherbst: well.. worst case the unused channels contain garbage, no?
16:02 karolherbst: but yeah.. after optimizations there can be dead channels
16:03 glennk: worst case the backend is untested complicated garbage
16:03 karolherbst: could be
16:03 karolherbst: I just said it would be a fun project :p
16:03 karolherbst: not that we should definitly do it
16:03 glennk: right, i'm just sort of weighing pros and cons
16:04 karolherbst: yeah...
16:04 glennk: tgsi translation is lightweight, nir is not
16:04 karolherbst: the main idea would be to get more optimized shaders to make things like gnome-shell less heavy on the hw
16:04 glennk: for this particular architecture
16:04 karolherbst: well
16:04 karolherbst: we can cache shaders
16:04 karolherbst: but yeah
16:04 karolherbst: but so is every optimizing compiler
16:05 glennk: may be a better option to use nir_to_tgsi
16:05 karolherbst: might be
16:05 karolherbst: glennk: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8044
16:05 glennk: the optimizations don't seem to be helping r600g nir much compared to what sb was doing
16:05 glennk: as a comparison point
16:06 karolherbst: okay sure, but r600 always had some sort of optimizations happening, no?
16:06 karolherbst: and nv30 is just... nothing, or not?
16:07 glennk: well, if you disable sb you get something similar to nv30
16:07 karolherbst: sure, but we don't have anything like sb, do we?
16:07 karolherbst: we could also start adding our own IR to nv30 and do optimizations and everything
16:07 karolherbst: would be another option
16:08 karolherbst: don't see the point of adding yet another IR
16:08 glennk: me neither
16:09 karolherbst: I think that gtn and ntt this should be the cheapest way for us, but this MR is so huge and impacts so much, I doubt it will land any time soon
16:09 karolherbst: and for nv50 and nvc0 it's probably easier to just drop TGSI then
16:09 karolherbst: instead of having to rely on gtn+ntt
16:09 karolherbst: as we already support nir there
16:10 glennk: also if it increases register usage that basically means all the desktop composition shaders stop compiling
16:10 karolherbst: yeah... that's an annoying scheduling issue
16:10 glennk: so either nv30 would have to gain a register allocator
16:10 karolherbst: glennk: just use src/util/register_allocate.h :p
16:11 karolherbst: some even use it directly on the nir and turn ssa values into registers
16:11 karolherbst: I think
16:11 karolherbst: the question is just.. can we spill on nv30
16:11 glennk: hah, no
16:11 karolherbst: figures
16:12 karolherbst: but yeah..
16:12 karolherbst: scheduling is an issue
16:12 karolherbst: not sure if there are nir based schedulers already to reduce register usage
16:12 glennk: less of an issue on nv30
16:12 karolherbst: I meant instruction scheduling
16:12 karolherbst: ehh..
16:12 karolherbst: reordering
16:12 glennk: yeah, not so much of an issue on nv30 hardware
16:13 glennk: you mean program dependence graph in the compiler
16:13 karolherbst: well.. if that's what is called? I mean the thing where you reorder instruction so your register usage goes down or so
16:13 karolherbst: or stalls go down
16:13 karolherbst: or something
16:13 glennk: emitting instructions in an order to minimize overlapping live ranges
16:13 karolherbst: yeah
16:13 karolherbst: I think there are experiments for that
16:13 karolherbst: somewhere
16:13 karolherbst: let's see
16:14 karolherbst: nir_schedule.c
16:15 glennk: sb's gcm pass is pretty powerful there
16:16 karolherbst: yeah..I bet
16:16 karolherbst: it was all pre nir afaik
16:16 glennk: yeah its a binary r600 ISA to binary optimizer
16:17 karolherbst: uhh
16:17 karolherbst: it operates on the binary directly?
16:17 glennk: yeah, thats why it can't handle too many temps
16:17 karolherbst: oh wow
16:17 glennk: the ISA can only encode so many
16:19 glennk: anyway, for nv30/nv40 fragment shaders, to improve performance relative to whats currently in nouveau, 1. reduce number of registers used, its similar to how later architectures get more warps in flight
16:20 glennk: and 2. reduce precision to fp16 when possible
16:21 glennk: most shaders running on this old hardware will be short enough that there's not a lot of ops for an optimizer to bite into
16:54 karolherbst: mupuf, RSpliet, pmoreau: wrote something imporant for a change: https://lists.freedesktop.org/archives/nouveau/2021-August/039142.html would be cool if you'd comment on that. Ben and Lyude are already aware of this. And I hope I didn't forget to ping somebody else
17:22 karolherbst: pmoreau: btw.. seems like that forks taking forever is a huge issue... maybe I can figure out how to solve it, but...
17:22 karolherbst: ohh.. I have an idea..
17:23 karolherbst: well.. not really though
17:23 pmoreau: karolherbst: Re register allocation in Nir: ah true we have all those hacks. Mmh, I’ll leave that be for now and still have plenty of other stuff to do.
17:23 karolherbst: pmoreau: you came to the same conclusion as I dif :)
17:23 karolherbst: *did
17:24 pmoreau: :-D
17:24 pmoreau: Re your email: I will try to reply to it today (or before end of week at the latest), but sounds good to me! Does Ben prefer to Ack/Nack things on the ML?
17:24 karolherbst: no clue
17:24 karolherbst: but we can't submit to drm on gitlab
17:25 karolherbst: not yet at least
17:26 pmoreau: Re the fork: Not too surprised in a way, given how big the Linux kernel is with all its history. Maybe trying to use the submodule one could work better for that? But it would have other issues as well.
17:26 karolherbst: pmoreau: well.. that process takes like 5 seconds on github
17:26 pmoreau: Ah
17:26 karolherbst: but
17:26 karolherbst: on github all forks are within the same repository
17:26 karolherbst: so it's really just adding branches on the remote
17:27 karolherbst: gitlab does a real fork
17:27 karolherbst: but I am not sure that it still has to take so long...
19:40 glennk: huh, what sets up the vertex attribute mapping VS -> FS in nv30 when its not taking the nv30_draw_vbo path?
19:47 karolherbst: glennk: do we even have any other draw path?
19:48 karolherbst: guess the only other path where it would matter would be a 3d blitter
19:48 glennk: s/draw_vbo/render_vbo/
19:49 karolherbst: glennk: that's called from within nv30_draw_vbo
19:49 glennk: conditionally called
19:50 glennk: for the cases its not called, there appears to be nothing setting up attribute mapping
19:50 karolherbst: mhh
21:55 RSpliet: karolherbst: ack. I have no thoughts yet, I'll let it simmer for a bit and see if I have anything constructive to say/add
21:55 karolherbst: k
22:02 RSpliet: I think the goal is good. I think using a platform that supports pull requests could help keep track of smaller patches easier, so I approve of that too. CI is good, but tricky on a tight budget. There's a lot of things I just agree with.
22:03 karolherbst: yeah well.. I wouldn't have written it if people wouldn't agree :p
22:04 RSpliet: Maybe I wouldn't push for another layer of indirection. I feel like what makes it into *the new drm/nouveau tree* should be good to funnel upwards sort-of-blindly. Perhaps Ben, Dave or someone should invest in training Lyude and you to do good reviews to a point that they trust your judgement
22:05 Lyude: RSpliet: yeah I had mentioned something about ben not scaling
22:06 Lyude: and i'm happy to do reviews for nouveau, especially since my plate is finally clearing up (albeit slowly)
22:09 RSpliet: Lyude: Glad to hear you're anticipating a bit more bandwidth!
22:09 Lyude: yeah :), I am excited to get back to working on stuff like nouveau a lot more
22:10 RSpliet: The #1 nouveau-killer is lack of peoplepower, so I'm excited about that too :-D
22:14 karolherbst: RSpliet: I have drm-misc commit rights ¯\_(ツ)_/¯
22:15 karolherbst: and I already used those powers!
22:15 karolherbst: danvet already agreed to the idea anyway
22:16 karolherbst: more or less
22:16 karolherbst: I talked with skeggsb about it and Ben requested to be able to nack the whole thing or something
22:16 karolherbst: that's pretty much the only reason I put it in there
22:17 karolherbst: or at least "chance to nack"
22:17 karolherbst: and I think this trust can only be built up by actually doing it
22:19 karolherbst: I can also just push things without skeggsb knowing it, but I thought that might be too drastic and I never wanted to have this "we are doing it this way, end of story" situation to begin with. If skeggsb wants to have a chance to nack stuff, then so be it
22:19 karolherbst: worst case I post the series two weeks earlier, hear nothing from skeggsb and push anyway or something.. dunno
22:19 karolherbst: but at least I want to have this conversation
22:20 karolherbst: and it would be best to have it on the thread :p
22:27 mupuf: karolherbst: with gitlab, skeggsb could "approve" the series, to indicate he checked it out :)
22:28 mupuf: quickly reading your email, I am pretty pleased to see where things are going
22:28 mupuf: and I would like to congratulate Ben for trusting you guys
22:28 karolherbst: mupuf: for example
22:29 karolherbst: but the idea isn't that skeggsb checks every MR
22:29 karolherbst: just the final inclusion thing
22:29 karolherbst: because that would be kind of pointless if he would check out every MR anyway
22:29 mupuf: karolherbst: then just pinging him before merging is easy too
22:29 karolherbst: yeah.. something
22:29 mupuf: I'll write a proper answer, but I'm all for Gitlab and CI there
22:29 karolherbst: nice
22:30 mupuf: as for the "how long does it take to fork nouveau" thing, not sure I really care... How bad is it anyway?
22:30 karolherbst: maybe skeggsb is happy with me "announcing to merge into drm-misc-next" 1 or 2 weeks before
22:30 karolherbst: dunno
22:30 karolherbst: mupuf: 10-15 minutes
22:30 mupuf: karolherbst: meh then. It's something for gitlab to improve. If fdo is fine with the disk usage, I'm fine with it!
22:30 karolherbst: it's not _bad_ but bad enough that some might loose interested doing an MR
22:31 karolherbst: but they can always send out emails soo... ¯\_(ツ)_/¯
22:31 mupuf: and I guess people will just have their Linux tree anyway
22:31 karolherbst: mupuf: sure, but then they can't do MRs against nouveau
22:31 karolherbst: not talking about git operations
22:31 karolherbst: I meant the fork on the UI
22:31 mupuf: You sure?
22:32 karolherbst: how can you do a MR on gitlab otherwise?
22:33 mupuf: well, there is "change branch"
22:33 mupuf: but indeed, it does not seem to be that flexible :s
22:33 karolherbst: the UI is broken anyway
22:33 karolherbst: :D
22:33 mupuf: anyway, bed time!