00:07 karolherbst: imirkin: :) I doubt you like my current approach, but it seems to work somewhat
00:07 karolherbst: for fixing the 64 bit type stuff
00:09 karolherbst: I don't even like it myself...
02:35 mischmerz:looks around ...
02:35 mischmerz: Good eveneing :D
04:27 karolherbst: imirkin: some compute shader which compiles for pre maxwell: https://gist.github.com/karolherbst/9acf9139883befadbe50ce1a48300705
04:28 karolherbst: crash in nv50_ir::GCRA::coalesceValues
04:28 imirkin: :(
04:28 karolherbst: RIG_Node *nVal = &nodes[val->id]; val is NULL
04:28 karolherbst: nouveau_shaderdb/everspace/20170912/3040.shader_test
04:30 imirkin: with nir, or with tgsi?
04:30 karolherbst: tgsi
04:30 karolherbst: the shader code is something like that: imageStore( ci0, int(gl_GlobalInvocationID.x), cu_u[0].x.xxxx)
04:31 karolherbst: seriously though: "uniform uvec4 cu_u[1];
04:31 karolherbst: "
04:33 imirkin: the code's messed up
04:33 imirkin: the rules for load prop are wrong
04:33 imirkin: for pre-maxwell
04:33 imirkin: we can do shit like
04:33 imirkin: 6: not %p21 sustp BUFFER $r0 $s0 f32 # %r14 c0[0x0] %r15 %r15 %r15 %r19 (0)
04:34 imirkin: the c0[0x0] is an arg to that op iirc
04:34 imirkin: er, more like
04:34 imirkin: 12: not $p1 sustp BUFFER $r0 $s0 f32 # u32 $r0d c7[0x424] $p0 $r4q (8)
04:34 imirkin: however i think on maxwell that is no longer the case
04:34 imirkin: and so we end up with
04:34 imirkin: https://hastebin.com/iheqitomip.pl
04:34 imirkin: which is obviously illegal
04:34 imirkin: and then the RA explodes, but that's not the cause of the issue
04:34 karolherbst: ups
04:35 imirkin: so ... fix insnCanLoad() to not allow loading constbufs into OP_SUSTP for maxwell+
04:36 karolherbst: should be fairly easy....
04:36 imirkin: or fix the load to not generate it in the first place
04:37 imirkin: hm no, it's explicitly propagated
04:37 imirkin: { OP_SUSTP, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0 },
04:37 imirkin: right
04:37 karolherbst: mhh okay, if I disallow loading c[] it indeed works
04:37 karolherbst: yeah
04:37 karolherbst: that I changed
04:37 imirkin: (0x2 == load the thing)
04:37 karolherbst: right
04:37 imirkin: (for arg1)
04:38 karolherbst: wondering if we want a new table for maxwell :(
04:38 imirkin: nice find of such a simple shader - this would have been a pain to work out in a larger shader
04:38 karolherbst: :)
04:38 karolherbst: yeah, compute shaders are usually quite massive
04:38 imirkin: otoh a more complex shader might not have triggered the issue ;)
04:39 imirkin: it's some weird initializing shader
04:40 imirkin: just sticks some uniform into an entire buffer
04:40 imirkin: glClearBufferSubData wasn't good enough for them i guess
04:41 karolherbst: :)
04:41 karolherbst: well that hlsl code is some weirdo stuff
04:41 imirkin: nah, that's just genero helper code
04:41 imirkin: for all their shaders
04:42 karolherbst: yeah right, but why call it "compiler_internal_" ?
04:42 imirkin: i esp like cu_u[0].x.xxxx
04:42 imirkin: take the x attrib, and splat it to all 4 components
04:42 imirkin: for a r32ui image. wtvr.
04:45 karolherbst: do we really want a seperate table for maxwell or shall I just move that in a file and do some macro magic to include it twice?
04:45 karolherbst: or should I hard code that condition somwhere
04:46 karolherbst: wondering if something like that could happen for other su ops as well
04:48 imirkin: in some places we fix it up when processing the table
04:48 imirkin: probably - check how they're emitted
05:04 karolherbst: uh well, that ain't so bad: total gprs used in shared programs : 318801 -> 365629 (14.69%)
05:04 karolherbst: other numbers are totally uninteresting: total instructions in shared programs : 2183853 -> 2185869 (0.09%)
05:04 karolherbst: I guess that higher gpr usage comes from those immediates
05:06 imirkin: yeah. and higher op count is probably from the extra mov's
05:07 karolherbst: doubtful
05:07 karolherbst: 4133 shaders helped 5907 hurt :)
05:07 karolherbst: it could be everything really
05:08 imirkin: heh
05:08 karolherbst: I am sure those extra movs do something to that, but usually I saw less movs with the nir version
05:08 karolherbst: no idea, RA seems to like it more and adds less silly movs
05:12 karolherbst: so maybe I have a fix for that immediate problem now
05:14 karolherbst: at some point I also plan to throw in some NIR opts to detect some opts we could do as well or things we might want to replace by native codegen opts
05:14 karolherbst: I can imagine that there are some nir opts we don't support for real and most likely won't ever due to not enough time :)
05:17 imirkin: yeah, of course
05:19 karolherbst: mhh
05:20 karolherbst: seems like I already did somthing by accident so that the immediates are moved
05:21 karolherbst: oh no, I just wrote silly code
06:14 Manoa: it would be nice to have openCL in the nvidea cards
06:15 Manoa: you know last time I tested nouveau was on the GK110B and it worked flawless, equally as fast as the nvidea driver, so I guess GL development of nouveau is completed :)
06:15 koz_: I believe karolherbst and mupuf are busy hacking on it.
06:15 Manoa: and mesamatrix also show all done :)
06:17 imirkin: pmoreau's been working on getting opencl going for the past 2 years or so
06:17 koz_: Yeah, it was pmoreau not mupuf, my bad.
06:18 imirkin: we get something like 40-80% of blob perf, depending on the gpu and load
06:20 Manoa: I remember that topic in phoronix, about shader compiler and Z culling
06:20 Manoa: and there were a cuple other things
06:20 imirkin: well, the honest answer is we have no idea
06:20 imirkin: we have some theories, esp about things we know we don't do, but hard to measure the impact.
06:22 Manoa: and don't forget: you have something that nvidea doesn't: nine :)
06:23 Manoa: which too worked grate on my linux box
06:28 Manoa: if my experience any indication, I played Left 4 dead 2 with 5 GB of texture mods and reshade shader mod on the GK110B with 120 fps, and that was wine 2, im sure thing would be mutch bether with now wine 3
06:30 Manoa: I missed the window I think for wine 3, could gived some reports but my system is sorta "stuck" here for now :x
09:30 hakzsam: imirkin: nice work for bindless btw :)
09:51 Trunksistor: Hi all
09:51 Trunksistor: Anyone can help with mmiotrace?
09:53 Trunksistor: I've got a strange behavior working with CUDA and no clue...
10:04 Trunksistor: Anyone used mmiotrace with CUDA?
10:04 karolherbst: Trunksistor: depends on what you are looking for, but mmiotrace shouldn't really help you here, should it?
10:05 karolherbst: pmoreau, mupuf: please change your email provider: "550 spam detected" ....
10:06 karolherbst: ....
10:06 karolherbst: I give up
10:06 Trunksistor: Well if it help me or not it's a really good question. But the problem is that I've activated the the tracer, I've loaded the driver and my pc gome freeze
10:07 Trunksistor: *gone
10:07 karolherbst: yeah
10:07 karolherbst: I wrote a patch for that
10:08 Trunksistor: That's a good news
10:08 karolherbst: Trunksistor: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/mm?h=v4.15-rc7&id=6d60ce384d1d5ca32b595244db4077a419acc687
10:08 karolherbst: it is inside 4.15
10:08 karolherbst: maybe you hit another bug...
10:09 Trunksistor: Nope, I was on 4.14.12
10:09 Trunksistor: But I cannot find info on this bug, my bad
10:09 karolherbst: well there is still the posibility
10:10 karolherbst: you can try 4.15-rc.... something and report back if this helps with your issue or not
10:11 Trunksistor: Yeah, I'll try this morning (italy here)
10:11 karolherbst: well and I need to go to work :)
10:11 Trunksistor: If you are interested in "why", it's for my master's thesis
10:12 Trunksistor: Ok, thanks for help!
10:12 Trunksistor: Have a good day ;)
10:17 orbea: cool, opengl + vdpau with mpv doesn't blow up my system anymore :)
10:18 orbea: well, not the most recent mpv commit since it needs git ffmpeg now...
10:22 orbea: and hwdec in mpv looks amazingly terrible compared to mplayer...
10:22 orbea: like everything is out of focus :P
13:01 robclark: mupuf, imirkin, btw, not sure if you saw https://github.com/envytools/envytools/pull/115 (do folks automatically get notified, or do I have to assign reviewers??).. anyways, not sure if you have opinions about whether that is worth doing? (ie. trying to merge freedreno envytools stuff back to upstream envytools)
13:02 karolherbst: robclark: github has a bot
13:03 karolherbst: robclark: "[Samstag, 6. Januar 2018] [23:48:10 CET] Notice -envytools to #nouveau- [envytools] robclark opened pull request #115: DONTMERGE: freedreno rebase to upstream envytools, take #1 (master...freedreno-rebase) https://git.io/vNkLU"
13:04 karolherbst: robclark: well in general I am all for it if features are getting merged back
13:04 karolherbst: can't say for register documentation, this might require a different approach in the end
13:04 robclark: oh.. I didn't notice that (but I guess I was without IRC around that time since p.fd.o shells disabled)
13:04 karolherbst: because we don't want to have one repository for all devices
13:05 karolherbst: maybe we should split it?
13:05 karolherbst: and a user could install multiple databases for different devices from different repositories?
13:05 robclark: well, one approach could be just having different directories for diff devices.. or another would be split up git tree w/ common stuff and then per-device git trees..
13:05 robclark: (although that seems like more work)
13:06 karolherbst: robclark: maintainership can get challenging if you split it
13:06 karolherbst: I mean
13:06 karolherbst: if you have all in one
13:06 robclark: anyways, I mostly just want to see what others opinions are (and to see if it is worth spending more time to backport rnn/headergen stuff
13:07 robclark: hmm, I think I probably can already push to upstream envytools tree (although haven't tried).. when the github organization was created someone added me to it
13:07 karolherbst: :)
13:07 robclark: having single tree does have the advantage of being a bit simpler for changes that span parts..
13:08 karolherbst: but in the end I think it would be nice if you could just grab an envytools common thing and add your custom stuff to it without worrying to much about rebasing and so on
13:08 karolherbst: true
13:09 robclark: I guess the downside of separate trees.. if someone (for example) makes some rnndec change, what are the odds they are going to remember to update all the users of it in diff trees
13:13 karolherbst: yeah.. it would require some release management
13:13 karolherbst: and I guess nobody wants to do that
13:13 robclark: I suppose might be solveable w/ some travis.. but in general I'm a fan of not making things harder than they need to be ;-)
13:16 karolherbst: right :)
13:29 karolherbst: robclark: did you ever looked into transform feedback within gallium?
13:29 karolherbst: I am currently wondering what the conditions are so that tgsi.stream_output is filled
13:30 robclark: yeah, a3xx+ all support TF.. it's needed for some pretty early gl version..
13:30 karolherbst: mhh
13:30 karolherbst: right
13:30 karolherbst: what I am looking at is, why is it broken for geometry shaders :)
13:31 karolherbst: with nir
13:31 robclark: hmm.. well I suppose I've only done TF w/ VS since no GS yet ;-)
13:32 robclark: karolherbst, ok, it is filled by creatively named st_translate_stream_output_info2().. I think that gets called for VS but must be missing for GS..
13:32 karolherbst: well that isn't even called from glsl_to_nir
13:33 robclark: called from st_translate_vertex_program() for VS
13:33 karolherbst: yeah
13:33 robclark: I think it probably needs to be added somewhere for GS..
13:33 karolherbst: I think geometry shaders are a bit different here
13:33 karolherbst: well it works with TGSI and so on
13:33 karolherbst: let me check something
13:34 karolherbst: yeah
13:34 karolherbst: not called for GS even with TGSI
13:34 robclark: in tgsi case it gets called from glsl_to_tgsi
13:34 robclark: iirc
13:34 robclark: so probably want to add a call in st_translate_geometry_program().. I think.. I'm just looking at it..
13:35 karolherbst: st_translate_stream_output_info might be called for gs
13:35 karolherbst: mhh
13:35 karolherbst: no
13:36 karolherbst: ohh wait
13:36 karolherbst: but it makes sense
13:36 karolherbst: let me check something
13:36 karolherbst: yep
13:36 karolherbst: that is missing
13:36 karolherbst: st_translate_stream_output_info has to be called
13:37 karolherbst: ....
13:37 karolherbst: src/mesa/state_tracker/st_program.c:508
13:37 karolherbst: see that if clause surrounding it?
13:37 karolherbst: :)
13:38 robclark: that is for VS
13:38 karolherbst: yeah
13:38 karolherbst: but this is okay
13:38 robclark: note the st_translate_stream_output_info2() call around 464
13:38 karolherbst: both gets called at VS time
13:38 robclark: but GS needs equiv logic
13:39 robclark: it's all a bit messy.. but if you want give me 10min and I'll give you a patch to try
13:39 robclark:is remembering how this worked again ;-)
13:39 karolherbst: :)
13:44 robclark: karolherbst, do TES/TCS also have TF? For TF it looks like we still need to generate outputMapping which is (for tgsi) done in st_translate_program_common() but never called for NIR..
13:44 karolherbst: I think so, yes
13:45 robclark: k
13:45 karolherbst: or maybe not?
13:45 karolherbst: there are no tests for that in piglit
13:45 karolherbst: so maybe it is fine
13:46 karolherbst: let us only care about GS so far, because we have tests for this
13:46 robclark: hmm, k.. well I'll just add new fxn and call it from st_translate_gs_program().. we might need to add similar call for tes/tcs when that is handled in nir
13:47 karolherbst: :)
13:47 karolherbst: well I am sure there is still some stuff to do
13:47 karolherbst: besides TF
13:47 robclark: sure, just mentioning that since I will forget ;-)
14:00 robclark: karolherbst, I thikn this should do the trick: https://hastebin.com/raw/ohisatuwow
14:00 karolherbst: doesn't seem to do anything (or I have other bugs as well)
14:01 robclark: hmm, st_translate_geometry_program() not getting called?
14:01 karolherbst: it is
14:01 robclark: try setting breakpoint on st_translate_program_stream_output() and see if it gets hit
14:02 robclark: oh.. hmm.. one sec.. I guess last_vert_prog is probably not what we want..
14:02 karolherbst: :)
14:03 karolherbst: well the function gets called, but mhh
14:04 robclark: karolherbst, https://hastebin.com/raw/gajuxoxaja
14:05 karolherbst: it is getting called too late
14:06 robclark: ?
14:07 karolherbst: let me check something in gdb first
14:07 robclark: hmm, I guess one thing I didn't check was were varying packing was done, but I should hope this happens after (with correct varying locations)..
14:08 karolherbst: mhh
14:09 karolherbst: okay, it seems to be getting called, but the stream_output struct doesn't get filled
14:15 karolherbst: weird
14:16 robclark: what is gl_transform_feedback_info::NumOutputs?
14:17 karolherbst: 1
14:18 karolherbst: mhh I think the part is missing where the object is copied to the driver or something, let me dig through that a bit
14:18 robclark: hmm, ok..
14:18 robclark: oh, yeah, there are probably multiple copies of the state object..
14:19 karolherbst: yeah
14:19 karolherbst: with TGSI we also just have a copy of that one
14:20 karolherbst: but I am not quite sure where the copy is made :(
14:23 robclark: karolherbst, in st_get_basic_variant().. I guess..
14:25 robclark: karolherbst, try https://hastebin.com/raw/zegojigake ?
14:25 karolherbst: :) that looks like it might work
14:25 robclark: that might end up being not quite right for VS..
14:25 karolherbst: vs has it's own st_get_basic_variant
14:26 karolherbst: called st_get_vp_variant
14:26 robclark: ahh, right..
14:26 karolherbst: error: incompatible types when assigning to type 'struct pipe_stream_output_info' from type 'struct pipe_shader_state'
14:27 karolherbst: vpv->tgsi.stream_output = stvp->tgsi.stream_output;
14:27 karolherbst: inside st_create_vp_variant
14:27 robclark: ahh, heh, ok, makes sense
14:39 karolherbst: pass
14:39 karolherbst: tgsi.stream_output = prog->tgsi.stream_output;
14:39 karolherbst: :)
14:40 karolherbst: robclark: thanks for your patch. Will do a full piglit run now
14:41 karolherbst: there seems to be something wrong though, I get an unusual amount of crashes
14:41 robclark: hmm..
14:42 robclark: I guess gdb one of crashers, I could quite well have overlooked something simple
14:42 karolherbst: well, we will know in a minute
14:47 mupuf: robclark: I would be all for sharing the same repo
14:47 mupuf: but mwk would have a better insight
14:48 mupuf: now, the question is: How do we detect which rnndb to use
14:49 karolherbst: robclark: ... silly issue
14:49 karolherbst: robclark: prog->sh.LinkedTransformFeedback is NULL
14:49 karolherbst: oh well
14:49 karolherbst: I guess there can be a NULL check around that
14:50 robclark: mupuf, ok.. well I can spend some more time today porting rnn/headergen fixes.. I had rough idea to move things to subdirs so you'd just do thing like: 'headergen qcom/adreno.xml' or 'headergen nv/whaterver.xml'.. but I can play around with that
14:50 robclark: karolherbst, oh, heh
14:50 robclark: ok, well I guess it is only null in the non-TF case ;-)
14:50 karolherbst: :)
14:51 karolherbst: mupuf: vendor id?
14:51 mupuf: karolherbst: for nva_ that should be good
14:52 mupuf: for lookup, I guess we could have the one file specifying the chipsets common
14:52 karolherbst: well for lookup you are screwed anyway
14:52 mupuf: this way, we can say: lookup -a NV86 fdsfdsfdsf
14:52 karolherbst: we can add pretty names
14:52 karolherbst: yeah
14:52 mupuf: lookup -a A400 ffddd
14:52 karolherbst: or do sume prefixing
14:52 karolherbst: -a NV.GK106
14:52 karolherbst: *some
14:52 mupuf: that too, but I would rather avoid it :D
14:52 karolherbst: well
14:53 karolherbst: you don't want to share a common namespace with everything
14:53 mupuf: as for mmiotrace decoding, that will be interesting
14:53 mupuf: but I guess we can default to nouveau there anyway
14:53 karolherbst: mhhh
14:53 karolherbst: allthough we know it
14:53 karolherbst: the header tells us the device id
14:53 mupuf: oh, cool then
14:53 robclark: hmm, I don't think I've ever used lookup
14:54 karolherbst: mupuf: kind of
14:54 karolherbst: oh wait, we don't
14:54 imirkin: lookup -a e6
14:54 imirkin: i use it all the time
14:54 mupuf: yeah, that's what I thought
14:54 robclark: anyways, I guess easy enough to differentiate NVxy vs Axyz .. not sure about etnaviv but I guess that would be V<something>?
14:55 karolherbst: mupuf: we get stuff like this in the header: PCIDEV 0100 10de11e0 10 f6000000 e000000c 0 f000000c 0 e001 f7000000 1000000 10000000 0 2000000 0 80 80000
14:55 mupuf: imirkin: e6 will be problematic :s But I guess we could default to nouveau in this case
14:55 mupuf: 10de --> nvidia
14:55 karolherbst: but
14:55 karolherbst: MAP 874.520713 1 0xf6000000 0xffffc90000044000 0x1000 0x0 0 ?
14:55 karolherbst: how do you map this map to the PCIDEV thing?
14:55 karolherbst: ohhh
14:55 karolherbst: f6000000 == 0xf6000000
14:55 mupuf: f6000000 is written right there, in the PCIDEV
14:55 karolherbst: yeah
14:55 mupuf: so, doabl
14:55 mupuf: e
14:56 karolherbst: e000000c size?
14:56 mupuf: sounds enormous
14:56 karolherbst: mhh probably another range
14:56 imirkin: karolherbst: read demmio code for how to parse the traces
14:56 imirkin: note that demmio also handles multiple gpu's in the same trace
14:56 karolherbst: :)
14:56 karolherbst: right
14:57 imirkin: (i don't know what you guys are talking about, just throwing in additional bits of info)
14:57 karolherbst: robclark: well that's why I liked the idea of vendor prefixes, because this would just solve the issue
14:57 karolherbst: maybe add an env var with a default prefix
14:57 karolherbst: or config file
14:57 mupuf: imirkin: merging freedreno's and nouveau's envytools
14:59 robclark: fwiw, I think all the freedreno specific tools all load adreno/aXYZ.xml directly.. although probably since I didn't properly understand varset and chipid stuff ;-)
14:59 imirkin: mmmm
14:59 imirkin: well i'd rather see it as
14:59 imirkin: mv rnndb nvdb
15:00 imirkin: and then lookup -d nvdb -a whtvr
15:00 imirkin: (-d is taken. but you get the idea)
15:00 robclark: and since I don't use rnn/lookup (yet?) I guess we don't have to solve how to handle which vendor immediately.. just keep everything assuming nv for now
15:00 imirkin: and then most people might add ln -s foodb rnndb
15:00 imirkin: robclark: well it's more general, no? it's "which db to use"
15:00 robclark: yeah, that is probably reasonable
15:01 imirkin: robclark: or are you not suggesting dropping the adreno rnndb into the envytools project?
15:01 robclark: I was thinking more along the lines of: mv rnndb/* rnndb/nv
15:01 imirkin: mmmmmmmmmmmmmmmmmmmmmmm
15:01 imirkin: well, that's fine
15:01 imirkin: but only one of those should ever get read in at a time
15:01 imirkin: you never want to process adreno AND nv in the same process
15:02 robclark: imirkin, moving into one git tree was the idea.. at a minimum I'd like to get to point of having same headergen/rnndec/etc but I think is easier to just have everything in single tree (incl rnndb)
15:02 robclark: right
15:02 imirkin: yeah, no objection to it in concept
15:02 imirkin: just want to make sure the integration is non-painful for regular use
15:02 robclark: sure, ofc
15:03 imirkin: you have tools. we have tools. sometimes those tools may make invalid assumptions. etc
15:03 imirkin: ultimately it's mwk's baby
15:03 robclark: I think "keep everything assuming nv and then fix on case by case for shared tools" is reasonable.. or at least that is what I had in mind..
15:03 imirkin: so get him to weigh in
15:04 robclark: when I run headergen I specify file.xml so that should keep working..
15:04 imirkin: i guess we do too
15:04 imirkin: but for the various tools (think cffdump)
15:04 imirkin: they have to "find" themselves
15:05 robclark: I can just change cffdump to look for "qcom/adreno/a5xx.xml" instead of "adreno/a5xx.xml" for now..
15:05 imirkin: ok; that'd be easy enough
15:05 imirkin: not 100% sure how our various tools work
15:06 robclark: there ofc might be a better way.. maybe nvdb and qcdb plus rnndb symlink might be easiest way to start and not impact anyones workflow.. idk
15:06 karolherbst: robclark: [26012/26012] skip: 10346, pass: 15500, warn: 9, fail: 143, crash: 14 :)
15:06 robclark: \o/
15:07 robclark: karolherbst, I'll let you send patch to list then, since I guess you made one or two additional changes to it
15:07 karolherbst: yeah
15:07 karolherbst: I git am your patch
15:07 robclark: probably cc tarceri
15:07 imirkin: that number of failures sounds comparable to the regular flow
15:07 karolherbst: not quite
15:07 karolherbst: but yeah
15:08 karolherbst: I think we have half with tgsi
15:08 imirkin: wow. go us.
15:08 karolherbst: interpolateAt is the biggest thing now
15:08 karolherbst: and then some random small issues
15:08 imirkin: that stuff was such a pain to understand
15:08 karolherbst: interpolateAt is 11 fails
15:08 imirkin: coz step 1 is ... "figure out wtf interpolation is"
15:08 karolherbst: *are
15:09 imirkin: but it was fun
15:09 karolherbst: well
15:09 karolherbst: interpolation already works
15:09 imirkin: and actually i didn't even totally figure it out
15:09 karolherbst: just not those interpolateAt variants
15:09 karolherbst: :)
15:09 karolherbst: :D
15:09 imirkin: i think i only got a decent understanding when i was fixing some bugs in swr's interpolator
15:09 karolherbst: that at_sample thing is weird
15:09 imirkin: karolherbst: just at_offset with a fixed offset that's looked up
15:10 karolherbst: I see
15:10 karolherbst: well the tgsi code is weird
15:10 imirkin: (except more painful with varying sample locations)
15:10 imirkin: how so?
15:11 karolherbst: uh.. I meant at_offset indeed
15:11 imirkin: insn = mkOp1(OP_PIXLD, TYPE_U32, (offset = getScratch()), fetchSrc(1, 0));
15:11 karolherbst: those min/max/mul/cvt things
15:11 imirkin: that grabs the offset from $magicplace
15:11 imirkin: oh
15:11 imirkin: well
15:11 karolherbst: with the odd constants
15:11 imirkin: the interpolator ain't magic
15:11 karolherbst: :D
15:11 imirkin: sad though it might be
15:11 imirkin: it actually just takes a S0.4 number (i forget exact number of bits)
15:11 imirkin: but we get the offset as a float
15:11 imirkin: so ... all that code is just to clamp it to the valid range, and convert to fixed point
15:12 imirkin: there's a nice comment that describes it
15:20 karolherbst: robclark: https://github.com/karolherbst/mesa/commit/9900288931eadec60cff38de4aaa477fec470bfb
15:21 karolherbst: creative use of * incuded
15:21 karolherbst: *included
15:22 robclark: karolherbst, maybe one minor nit.. why not just have if (!prog->sh.LinkedTransformFeedback) return; at top of st_translate_program_stream_output()?
15:22 robclark: other than that, lgtm
15:22 robclark: ie. no point to building up outputMapping[] if no TF
15:23 karolherbst: right
15:24 karolherbst: will send it out after some testing
15:25 robclark: thx
15:31 karolherbst: nice... https://gist.githubusercontent.com/karolherbst/09d0f0e74912211bd36ed6b7e61a6e1a/raw/d06d0b218de1f16228909d2fa6a39d25a16eeff9/gistfile1.txt
15:32 karolherbst: bug: dmesg output messed up when hitting to WARN_ON hits at the same time :3
15:32 karolherbst: *two
15:33 pmoreau: Nice one!
15:34 karolherbst: I think sometimes I get thise also with three hits :)
15:40 pmoreau: The more the merrier!
17:47 chillfan: does nouveau have a fix for these issues? https://nvidia.custhelp.com/app/answers/detail/a_id/4611/~/security-bulletin%3A-nvidia-gpu-display-driver-security-updates-for-speculative
17:49 chillfan: or any news/info about that really
18:33 pmoreau: chillfan: I would guess, that rebuilding Nouveau (along with the whole kernel) using the Linux patches or a patched GCC would be enough. But I haven’t looked into it.
18:36 chillfan: Ah, I'll do that for now then
18:43 glennk: imirkin, hey at least you don't need to reverse project the coordinates for the offset variants
18:43 imirkin_: wha?
18:47 glennk: compare with tgsi_interp_egcm() in r600 for an example
18:49 imirkin_: dont have time now, but whatever it is, sounds horrid
18:50 glennk: sample offset in screen space, barycentric coordinates for sampler
18:50 imirkin_: ew. so you have to convert it into i/j offsets?
18:52 glennk: yup
18:53 imirkin_: that sounds ... non-trivial
18:53 imirkin_: *that* is when you really have to understand how interp works :)
18:53 chillfan: thanks for the info bbl
23:24 orbea: imirkin_: now that hwdec works with nouveau + mpv (vo=gpu hwdec=vdpau) I noticed hwdec in mpv looks really bad compared to mplayer, how would I determine if this is a mpv or nouveau issue? for example see: mplayer - https://i.imgur.com/EZYBGSW.jpg and mpv - https://i.imgur.com/GdhWGDn.jpg
23:25 karolherbst: orbea: should be the player
23:25 imirkin_: sounds like scaling is messed up
23:25 karolherbst: or no scaling enabled
23:25 karolherbst: well, no smooth scaling
23:26 orbea: alright, that was my guess, thanks :)
23:26 orbea: that its a player bug
23:26 karolherbst: well
23:26 karolherbst: I wouldn't say it is a bug
23:26 imirkin_: lack of a feature? :)
23:26 karolherbst: maybe there is an option for the scaling or whatever
23:26 karolherbst: :)
23:26 karolherbst: or that
23:26 imirkin_: although vdpau includes scaling
23:26 imirkin_: which must not be getting used
23:26 imirkin_: or ... perhaps it *is* getting used by mpv, and it's broken
23:26 orbea: the vo=vdpau has no such issue :P
23:27 imirkin_: oh
23:27 imirkin_: what's vo=gpu?
23:27 orbea: its what used to be vo=opengl
23:27 imirkin_: oh
23:27 imirkin_: how does that work?
23:27 imirkin_: did they add interlocking?
23:27 orbea: idk the details, it was the one that crashed the system with nouveua
23:27 imirkin_: right
23:28 imirkin_: i just mean ... "now that hwdec works"
23:28 karolherbst: yay, I have no fp64 related fails. just passes and crashes.....
23:28 imirkin_: what made it transition to the "working" state?
23:28 orbea: i thought it was the recent pushbuf patches?
23:28 orbea: but I haven't confirmed
23:28 imirkin_: srsly?
23:29 imirkin_: those were not at all concurrency-related
23:29 imirkin_: nothing in that area changed
23:29 orbea: was just a blind guess since I dont think much else changed :P
23:29 orbea: im hesistent to bisect it as bad commits take my system down really fast...
23:30 imirkin_: yeah
23:30 imirkin_: no worries
23:30 imirkin_: i'm guessing a change to mpv
23:31 orbea: could be, but wasn't very recent if so.
23:40 karolherbst: imirkin_: RA is actually better than we assumed regarding 64 bit types: "st b128 # l[0x60] %r101d %r102d" -> st b128 # l[0x60] $r0q
23:40 karolherbst: I would have guessed it would fail here
23:46 imirkin_: :)
23:47 karolherbst: but it seems like codegen really hates 64 bit c[] accesses
23:47 imirkin_: hehe yeah
23:47 imirkin_: i can't imagine that'd work
23:47 karolherbst: well
23:47 karolherbst: it works
23:47 karolherbst: but
23:47 karolherbst: sometimes not
23:47 imirkin_: :)
23:47 imirkin_: "don't do that"
23:48 imirkin_: only gmem can have 64-bit addrs
23:48 karolherbst: yeah
23:48 karolherbst: I write some mkLoad overloads now
23:48 karolherbst: so that I don't have to write crappy code everywhere
23:49 imirkin_: usually this sort of thing is abstracted away in fetchSrc or whatnot
23:49 karolherbst: well
23:49 karolherbst: tgsi onlny knows 32bit vals
23:49 karolherbst: this makes a lot of things much easier
23:49 karolherbst: so you get TEMP.xy for a 64 bit val
23:49 karolherbst: in nir, that would be ssa_45.x
23:50 karolherbst: or in TGSI you write into TMP.xy, in nir you get vec1 64 ssa_43 dest :)
23:51 karolherbst: just makes tracking sources/dest more annoying if everything gets reduced to 32 bit
23:51 imirkin_: yea
23:52 karolherbst: I am currently thinking if I just insert the split/merges as a fixup, fix RA or really mess with the current code :/
23:52 karolherbst: but as luck strikes me, I still have a few issues with load/stores, so I fix up those first
23:52 karolherbst: like slotting going wrong
23:52 karolherbst: imagine you get a dvec4 load, and your driver_location is 2 :)
23:52 karolherbst: and you iterate over the components
23:53 karolherbst: then you can't just do info->nir[idx].slot[comp]
23:53 karolherbst: * info->in[idx].slot[comp]
23:53 karolherbst: and assignSlots needs to be fixed up as well
23:53 imirkin_: good times.
23:53 karolherbst: :)
23:53 imirkin_: note that vs inputs are treated as 1 slot for dvec4
23:53 imirkin_: but everything else, it'd count as 2 slots
23:54 karolherbst: tarceri has some fixes for that
23:54 karolherbst: so I only fixup FP stuff for now