00:00dboyan_: karolherbst: The s space is bigger than 0x200 I think. It's at least 16k in size
00:00karolherbst: dboyan_: well, the compute descirptor says "s size: 0x200"
00:02dboyan_: well, no idea. I might be wrong.
00:02dboyan_: karolherbst: btw, did you test my nouveau-cache-test branch?
00:03karolherbst: imirkin_: where is the register stored inside the ValureRef for stuff like s[$r...+0x...]?
00:03karolherbst: dboyan_: no, and I doubt I will be able to do it today. Will do tomorrow
00:04dboyan_: okay, thanks
00:05dboyan_: I think I can try my luck applying for a dev key from feral
00:05karolherbst: dboyan_: you have 25 commits already? :p
00:06dboyan_: yeah, not nouveau-related though
00:06karolherbst: doesn't matter then
00:06karolherbst: dboyan_: https://www.feralinteractive.com/en/news/752/
00:07dboyan_: yeah, I just saw that
00:08karolherbst: I will go to bad, those issue begin to annoy me now :(
00:33barteks2x: does nouveau support nvidia gtx1080?
00:36gnarface:wouldn't hold his breath
00:42dboyan_: barteks2x: modesetting support is added in kernel version 4.8, but if you want 3d acceleration, you'll wait for the yet-to-be released 4.12. I guess performance will be poor for pascal even with 4.12.
00:47Calinou: it does technically
00:47Calinou: but it's slow
00:47Calinou: the mouse cursor used to not even work (it was stuck in top-left corner) for a while
00:47Calinou: (try Fedora 25, this will happen)
00:52barteks2x: that question wasn't really for me, I just already had irc installed and running and was on this channel when someone asked me
01:00Horizon_Brave: speaking of which...in cases like this, where modesetting is available, but not 3D acceleration... why is this? Is that reliant on the driver or the firmware that's written by nouveau? Nouveau isn't using the binary blog firmware from nvidia, so it has had to create it's own right? which hasn't been 100% figured out from the nvidia firmware blob? allowing it only some of the functions of theirs?
01:01Horizon_Brave: do I have that right?
01:01airlied: Horizon_Brave: no, nouveau for newer chips is reliang on the binary firmware from nvidia
01:02airlied: so far you don't need binary firmware to set modes
01:04Calinou: airlied: is this also the case for accelerated video decode?
01:04Calinou: I remember older cards requiring firmware extraction
01:04Horizon_Brave: ah... but for the functionality of 3d accel, and the more complicated stuff... do nouveau drivers still use the nvidia blob firmware, or do they write their own firmware "copies" of the nvidia ones?
01:06airlied: Horizon_Brave: depends on the gpu,
01:07airlied: Calinou: I don't think nvidia provide redistributable video firmwardes
01:07airlied: Calinou: firmware extraction is quite different than nvidia providing firmwares in the linux-firmware project
01:07airlied: Horizon_Brave: later Maxwell chips and newer require signed firmware images from nvidia
01:07airlied: before that nouveau writes its own
01:11Horizon_Brave: ah, got it... thanks airlied
01:13Lyude: imirkin_: doing the last self-check on the NV_fill_rectangle series before I send it out, do I actually need to add this? https://github.com/Lyude/mesa/commit/151e1b40df0ec99db89435861ca666c7fc387520#diff-bca73ce90b7d4aab49b3d4b75ea77ec6R504 I'm getting an unused variable warning here when I compile with this change
01:14airlied: Lyude: the get.c? you can drop that if there are no gets
01:15Lyude: cool, thanks
01:31imirkin: Lyude: yeah, definitely don't need that
01:49Lyude: imirkin: should be on the list now, and also https://paste.fedoraproject.org/paste/OddORXXHf~0I2egnYy0D3V5M1UNdIGYhyRLivL9gydE= yeah get_reviewer.pl lists you there for some reason
01:49Lyude: does that just generate it's reviewers based off who's touched the file?
01:53imirkin: if so, that's really bad
01:53imirkin: since those screen files get touched by anyone adding features
01:54Lyude: if that many people touch it then I guess not
01:57imirkin: well, it's just not an accurate representation of reviewers
01:58airlied:isn't sure mesa gets enough patches to require get_maintainers.pl for patch submission
01:58airlied:also ignores cc's anyways
01:59airlied: since the kernel maintainers script adds me for every patch to drm, which really I have the mailing list already
02:02Lyude: uhoh, anyone seen this before? "nouveau 0000:01:00.0: fifo: SCHED_ERROR 06 "
02:02Lyude: got that while running piglit tests
02:03imirkin: Lyude: congrats :)
02:04Lyude: imirkin: hm?
02:04imirkin: Lyude: which kernel is this on?
02:04Lyude: Linux LyudeCowCube 4.11.0-rc1Lyude-Test+ #1 SMP Tue Mar 21 23:15:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
02:04imirkin: is that basically a 4.11-rc1 kernel?
02:05Lyude: yeah, nothing changed
02:05Lyude: that's from drm-next jfyi
02:05imirkin: so ... not 4.11-rc1 at all?
02:05Lyude: right, I forgot that part of the kernel versioning is weird
02:06imirkin:goes to see what is missing from drm-next
02:06imirkin: hm, that should have all of Ben's recovery stuff
02:06Lyude: also let me make sure my drm-next is 100% updated
02:06imirkin: wait, did your GPU hang, or did you just get that message?
02:06imirkin: nah, don't worry about it, i'm sure it's fine
02:07Lyude: imirkin: GPU seems to be hanging, piglit isn't making any progress from where that started
02:07imirkin: is that message getting printed over and over?
02:07imirkin: ok. Ben will be interested.
02:07imirkin: i believe he thought he had defeated it
02:09Lyude: imirkin: think it's important to get a full test run with piglit for the NV_fill_rectangle stuff? or can I get away with only running a couple of tests
02:11imirkin: if it were me, i'd just run a couple tests
02:11imirkin: btw, if you want to run piglit successfully, i'd recommend the following:
02:12imirkin: [hold on, let me update that... -x max-texture-size is another one
02:12gnarface: imirkin: btw i had a strange dream about getting video decoding working on the G92 but i don't know how to test it
02:13imirkin: gnarface: yes, that is strange.
02:13imirkin: Lyude: ok, updated.
02:14imirkin: 2015-08-15. boy, i sure do keep those piglit test runs up to date.
02:14gnarface: imirkin: the trick (in the dream) was apparently to just set the core (cores?) to a "0" clock speed when initializing video playback. then the hardware just picks it up
02:15gnarface: (would in theory self-choose an "appropriate" clock for the video)
02:15gnarface: i may have dreamed this because someone told me this would work while i was very drunk. it also may have been just a dream
02:16imirkin: or maybe this is the dream world?
02:17gnarface: you mean, right now? odds of that are pretty low. i feel way too sober.
02:17gnarface: but, very rarely, i HAVE had prophetic dreams. just, usually not about things like driver troubleshooting.
02:19imirkin: more stock-market-related?
02:19gnarface: nothing so useful.
02:20imirkin: or more generic, like "beware the ides of march"?
02:20gnarface: i used to dream the whole next day in advance. just minutia, trivial conversations, etc. would all turn out exactly as expected. it may just be that my life is really that predictable...
02:20gnarface: but that creepy sense of deja-vu haunts me everywhere
02:20imirkin: perhaps you were influenced by the dream to make it come true? like a self-fulfilling prophecy, as it were?
02:21gnarface: totally possible
02:21gnarface: but it's just as possible someone in #nvidia actually told me this when i begged for advice on the matter late at night, drunk as hell, suffering from insomnia, some 2-4 years ago, and i only just remembered by dreaming about it
02:22gnarface: i guess that sounds also somewhat unlikely though, because someone else certainly would have fixed this by now...
02:23gnarface: unless i'm the last one still rocking a G92 card in a linux box
02:23imirkin: well, there should be one arriving at my door some time this week
02:23gnarface: (also wouldn't terribly surprise me - this one had overheating problems i had to fix by re-seating the heatsync myself)
02:27gnarface: great to hear, imirkin, i'm anxious to help test
02:27gnarface: i tried to upgrade the video card in that streaming box to a newer one the official nvidia drivers still (barely) support only to find out that the power supply in the box doesn't have enough connectors for it. so i'm still stuck with the G92 there.
02:28imirkin: gnarface: not sure what you're looking for in terms of GPU power, but chances are a new GK208-based card will meet most needs. (GT 710/720/730 for the most part, but be careful, could end up with a GF108)
02:29gnarface: imirkin: those ones DO support video playback with nouveau? the point at the moment was just to not spend any money on it.
02:29gnarface: spare parts only, if possible
02:29imirkin: gnarface: yeah, subject to the H264 video decoding bug that's been discussed recently
02:30imirkin: spare parts... more likely to be able to find some GT218's (GeForce 210's), which will also have working video decoding
02:30imirkin: G92 is basically the only GPU without working video decoding accel. (oh, and maxwell+)
02:30imirkin: [you picked wisely]
02:31gnarface: ah, interesting
02:31imirkin: Lyude: ok, so i have your tested-by for the pascal ddx patch?
02:32Lyude: imirkin: thanks re: piglit stuff, and yep
02:32imirkin: thanks. i'll let the patch sit out there for a little while longer and then push it
02:33imirkin: perhaps pmoreau or tobijk will be able to confirm your results
02:35tobijk: *waking up*
02:36imirkin: pascal patch for ddx
02:36tobijk: not yet, sorry
02:36imirkin: yeah, i meant eventually
02:36tobijk: but good timing, going to test now :)
02:36imirkin: i'm not waiting on you or anything
02:36imirkin: just want it to get as much exposure as possible, esp as diff people have diff setups
02:36imirkin: and use X differently
02:37dboyan: imirkin: Any idea if TGSI_PROPERTY_VS_PROHIBIT_UCPS still useful? No one is generating that and codegen is the only consumer
02:37imirkin: although iirc when i tried to bring up maxwell and was messing things up, i was breaking it bigtime.
02:37imirkin: dboyan: then "no" :)
02:37imirkin: dboyan: i'm guessing this was to indicate core vs compat profiles
02:37imirkin: since we would generate bullshit code for dealing with UCPS in that case
02:39imirkin: UCP = user clip plane, btw
02:39imirkin: they're especially painful to deal with once you start adding in shader stages like tessellation and geometry
02:43imirkin: but they're only a thing in compat profiles, and we don't support those stages in compat profiles. so ... problem solved :)
02:43dboyan: imirkin: TGSI_PROPERTY_VS_PROHIBIT_UCPS and TGSI_SEMANTIC_CLIPDIST both set the genUserClip to -1. I didn't looked very carefully into the history though
02:43dboyan: imirkin: So I guess it is safe to nuke them?
02:44imirkin: yep, it is
02:44imirkin: the reason that CLIPDIST sets genUserClip = -1 is that if you have gl_ClipDistance, you can't also have gl_ClipVertex
02:44imirkin: and/or UCPs
02:45imirkin: dboyan: btw, i'm doing a build test now, and will push your cache thing shortly
02:46dboyan: imirkin: thanks
02:58tobijk: imirkin: to sum it up: 2D seems to work fine (card is detected, x server starts), no reverse prime (intel) available, 3D accel only with llvmpipe for some reason
02:58tobijk: with your patch
02:59imirkin: tobijk: hmmmm... you might need a mesa from git
02:59imirkin: tobijk: or at least 17.x
02:59tobijk: imirkin: it is git dome yesteday
02:59imirkin: tobijk: can i see your xorg log?
02:59tobijk: yep, give me some secs, other system
03:00imirkin: tobijk: i'm guessing you don't have the firmware?
03:00imirkin: (check /lib/firmware/nvidia/gp106 or whatever)
03:00imirkin: if not, it's in the linux-firmware tree
03:01tobijk: imirkin: 3D accel works if i do DRI_PRIME=1 app when using intel as primary controller
03:01tobijk: anyway, logs incoming
03:04imirkin: i've gotta go... will look tomorrow. please make sure you have the firmware in place.
03:04tobijk: ke, see you tomorrow
07:27mwk: anyone in here coming to Insomni'hack tommorrow, by any chance?
08:35parchd: Just dropped by to say thank you for nouveau. I haven't done any scientifically valid benchmarks, but switching from the nvidia binary drivers to nouveau significantly improved my boot speed and made the system feel more responsive overall.
08:35parchd: So, thanks :)
09:03mangix: parchd: modesetting works wonders :)
10:34pmoreau: imirkin: Oops, I got confused between the tabs, should have answered here instead of #dri-devel.
10:36pmoreau: RSpliet: No idea, I have never played with texbars nor barriers yet. Plus, before playing with texbars, I need to get textures to work, which needs some extra work to upload the information (I think it should be similar to what is needed for ARB_bindless_texture). hakzsam should be able to confirm
11:28pmoreau: hakzsam: During XDC in Helsinki, I tried to add texture support in OpenCL but ran into issues when trying to run the code. We discussed about it, and I think you said we needed to upload some texture information to the GPU (like format and such), as that information was not available at compile time.
11:28hakzsam: yeah, the state tracker has to be updated/changed
11:29pmoreau: Would it be similar for bindless textures in OpenGL, or not?
11:36karolherbst: hakzsam: ohh, while you are here. I have found an issue inside hitmanPro, that generates OOR errors from compute shaders on my Kepler and we think this might be constbuf/global memory related. Odd thing is, from the looks of it, everything seems fine
11:36karolherbst: hakzsam: one example is this shader: https://gist.github.com/karolherbst/cfdb40249073de8cfc867d640a95f634
11:37karolherbst: if I mess around with "not $p0 ld u32 $r0 g[$r0d+0x1c]" and make a "not $p0 ld u32 $r0 g[$r0d+0x10]" out of it, the error is gone (and in several other shaders as well with different offsets)
11:38karolherbst: my current guess is, that it is either changing code paths in a way, that OOR access aren't happening anymore, or the $r0d+0x10 generates an address outside of the valid range of g
11:38karolherbst: no idea how to debug this though
11:39karolherbst: maybe you have any ideas?
13:14dboyan: karolherbst: how do you launch benchmark in hitman?
13:15mupuf: dboyan: there is an option in the launcher
13:16dboyan: mupuf: Oh, found it.
13:16dboyan: "Start Benchmark with Current Settings"
13:17mupuf: crashed for me on intel though
13:17mupuf: will need to make a trace!
13:43karolherbst: dboyan: you can also run it from the command line somehow
13:44karolherbst: dboyan: and you need some patches most likely
13:44karolherbst: dboyan: check out my hitman branch
13:45dboyan: karolherbst: haven't installed on my gt740m machine, how many fps do you think it will get on that one?
13:46karolherbst: dboyan: DDR3 or GDDR5?
13:46karolherbst: dboyan: doesn't matter anyway, close to 2fps
13:46dboyan: pretty close to my guess :)
13:47karolherbst: with hacky tricks I was able to tripple it
13:49karolherbst: well but tripple for me means I went from 6 to 16 fps
14:24mupuf: ah ah
14:24mupuf: well, it does not run at all on intel, so I guess that's already quite good :D
14:49karolherbst: mupuf: ;)
14:49karolherbst: well I spend the last days to make it run
14:50karolherbst: and they also have bugs inside their engine, which they will fix for nouveau or at least choose a code path which doesn't break stuff on their side
15:17karolherbst: is there something special which needs to be done to use certain parts from global memory within compute shaders?
15:18imirkin: i think some remapping is done somewhere
15:18imirkin: not sure on the details though
15:20pmoreau: Is there any browser that will open a text/x-log document rather than download it?
15:21pmoreau:would like to know before switching the type of the attachments from the latest bug report.
15:23karolherbst: okay, so this is the situation roughly: c1 -> g translation in the pre SSA to SSA translation. 64bit offset is saved inside c7[0x120], which I assume is the correct value and can be read out without issues (and because rendering is fine). Then something is read from g[c7[0x120] + offset], which sometimes causes those OOR errors. Now my assumption is: maybe something got lost while doing the c1 -> g translation?
15:24karolherbst: there is also a flag saved inside c7[0x128], which acts as a predicate for the loads later on
15:25karolherbst: the code is roughly like this: 'if(c7[0x128] >= $some_value) x = g[c7[0x120] + $some_offset]'
15:25imirkin: that is correct.
15:26imirkin: that contains the length
15:26imirkin: of the ssbo
15:26imirkin: it should actually be if ($some_value < c7[0x128]) do things with g[c7[0x120] + offset]
15:27imirkin: since we're supposed to not die if they go out of bounds with robustness and all that
15:27karolherbst: ohh, true
15:27karolherbst: "set u8 $p0 lt u32 $r5 0x0000000c"
15:27karolherbst: and then not $p0 ld
15:29imirkin: and it pulls in a 0 if it's a load and it's out of bounds
15:32karolherbst: why 0?
15:35karolherbst: well we don't die
15:35karolherbst: as I said: the rendering looks correct and everything
15:35karolherbst: which is quite odd
15:35karolherbst: because when I mess there with the generated code, it all looks super broken
15:47Lyude: imirkin, nv_fill_rectangle support is on the list :)
16:06imirkin_: Lyude: yay
16:06imirkin_: karolherbst: 0 because that's what robustness says
16:09karolherbst: imirkin_: mhhh I see.
16:10karolherbst: imirkin_: so if I would replace all wrong ones with a 0x0, then it should still render correctly and it means it's a bug inside their engine?
16:11karolherbst: OHHH I see it now
16:11karolherbst: okay, understood
16:12karolherbst: so c7[0x120] is the offset and c7[0x128] is the size of the region
16:12imirkin_: well, not the offset
16:12imirkin_: the address
16:12karolherbst: and due to robusntess we do those checks so that we don't got out of bounds
16:12imirkin_: (which includes any offset into the ssbo)
16:13imirkin_: and so that we have deterministic behavior for when they do go out of bounds
16:13karolherbst: okay, nice
16:13karolherbst: that means, due to robustness we shouldn't get the OOR errors, cause we actually check for valid addresses
16:14imirkin_: the engine isn't magic
16:14imirkin_: it only knows it's out of range if it knows the range
16:14imirkin_: and i don't see how it can possibly know the range of a ssbo
16:21imirkin_: Lyude: pretty good for a first mesa series!
16:21imirkin_: mostly minor items
16:22karolherbst: imirkin_: when you have time, could you look over my last two patches as well? (both FMA related)
16:32hakzsam: imirkin_: Lyude, yup, mostly cosmetic :)
16:32hakzsam: karolherbst: no ideas, but it's most likely a cb, not the global mem
16:39Lyude: imirkin_: thank you!
16:45karolherbst: hakzsam: well it was converted form c1 to g+offset
16:45karolherbst: but we do something wrong here and generate OOR errors
16:45karolherbst: rather the GPU generates those
16:50imirkin_: Lyude: btw, on the chance that my feedback sounds negative (seems likely when pointing out issues) - overall very good job on both the piglit and mesa patches.
16:55jamm: imirkin_: i have drm-next, but isn't the out of tree build here better? https://github.com/skeggsb/nouveau/ or maybe i should use the linux-4.11 branch there?
16:55imirkin_: jamm: yes and no
16:55imirkin_: the out of tree thing is nice
16:55imirkin_: unfortunately it's out of tree
16:55imirkin_: which means that it depends on some random kernel commit and you'll never know which one it is
16:55karolherbst: fortunatly I have branches for at least the three last kernel releases :O
16:56imirkin_: so you generally can't build the out-of-tree module against any kernel
16:56jamm: imirkin_: right, hmm
16:56jamm: what would you usually do for frequent compile-test-recompile cycles?
16:56imirkin_: that said, any particular commit will generally compile against the latest rawhide available at the time that ben was developing those changes against :)
16:56karolherbst: jamm: if you are on 4.10, you can use my "master_4.10" branch which is skeggsb/master just for 4.10
16:57imirkin_: jamm: assuming i'm not changing the universe, i just build it on a regular kernel
16:57jamm: karolherbst: 4.9 :/
16:57karolherbst: I don't guarantee to keep it updated on a daily base
16:57karolherbst: jamm: I also have a master_4.9 branch
16:57karolherbst: but it contains older stuff
16:57imirkin_: compile/recompile is a lot faster than the reboot cycle in between, so that doesn't slow me down :)
16:57imirkin_: jamm: could you remind me what you're doing? weren't you going to look at maxwell sched codes in the DDX?
16:58jamm: imirkin_: yeah! i've been a bit slow due to work, but i can take a look at this over the weekend
16:58imirkin_: jamm: so why do you need to compile/recompile the kernel module?
16:58imirkin_: or were you just asking theoretically speaking?
16:59jamm: imirkin_: you mentioned i'd need to bringup pascal which meant compiling nouveau against drm-next
16:59imirkin_: not quite
16:59imirkin_: it meant running drm-next :)
16:59jamm: ah, my bad
16:59imirkin_: drm-next has all the stuff
16:59jamm: so i should actually compile the whole thing
16:59jamm: and run it
17:00imirkin_: (well, you also need updated linux-firmware, but that's a separate thing)
17:00imirkin_: jamm: there's also a patch on list ot make the ddx work on pascal
17:00imirkin_: jamm: https://lists.freedesktop.org/archives/nouveau/2017-March/027519.html
17:00imirkin_: Lyude has made wild claims that it works
17:01jamm: doesn't linux-firmware come alongside the kernel repo?
17:01imirkin_: it's a separate repo
17:01jamm: ah, yeah i will try that
17:01imirkin_: distros package it up too
17:01imirkin_: some split it, some don't
17:01imirkin_: (some do both, for maximal simplicity)
17:02jamm: yeah, i currently use a part of linux-firmware for my wifi drivers
17:02imirkin_: anyways, you need to have a /lib/firmware/nvidia/gp106 directory
17:02imirkin_: if you don't, your linux-firmware is too old
17:02imirkin_: [or whereever your firmware lives, it's obviously configurable]
17:02jamm: right, nvidia had released signed firmware for pascal just a few weeks ago iirc
17:03imirkin_: sounds right. not sure on the precise timing, but "recently"
17:03jamm: ah, afair from the phoronix article XD
17:04jamm: great! so first, i'll compile drm-next and run my machine on that, then compile nouveau against this (not using the out of tree build) and apply that patch
17:04imirkin_: "against this"?
17:04imirkin_: you don't need to build any kernel other than drm-next
17:05imirkin_: the patch i referenced is for xf86-video-nouveau, which is entirely separate from the kernel
17:05jamm: argh, right, the latest nouveau already comes within drm-next
17:06imirkin_: nouveau is part of the kernel. nouveau changes flow through ben to dave to linus
17:06jamm: thanks for the guidance! i'm very new to the kernel layer so i might act stupid in many places XD
17:07imirkin_: no worries.
17:07jamm: my experience in assembly has been in college (which was just last year) doing some x64 nasm stuffs
17:07karolherbst: you won't do assembly within the kernel ;)
17:08imirkin_: so the x86-style CPUs are pretty well-behaved
17:08imirkin_: and generally do what it says on the tin
17:08imirkin_: you execute an instruction, it takes inputs, produces outputs
17:08jamm: i'll also check out maxas just to make sense of the opcodes
17:08imirkin_: and then moves on to the next instruction
17:08imirkin_: internally there's all kinds of stuff to make that faster, but the external view remains unchanged
17:09imirkin_: GPUs -- not so
17:09nyef: I've done a Linux kernel port to an unsupported MIPS machine, and it required basicallt no assembly work, it was all C.
17:09imirkin_: nvidia gpu's starting from kepler require "scheduling info" to go along with the actual instructions
17:10imirkin_: which tell the instruction dispatcher various things. those various things changed between kepler and maxwell
17:10karolherbst: nyef: yeah, nobody really wants to do assembly anyway, cause you do more errors and it's slower
17:10jamm: nyef, karolherbst: just wanted to make sense of maxwell assembly to be able to look at this issue https://trello.com/c/6LPB2EIS/174-update-maxwell-shaders-with-proper-delays
17:10karolherbst: I see
17:11imirkin_: so you have to tell it that it needs to wait X cycles before executing the next instruction, or to set a barrier to then be waited on by some later instruction, etc
17:11jamm: but i own a pascal card though, but imirkin_ said it shouldn't be an issue, the differences don't seem to be much
17:11nyef: Ah, yeah, at that point, or for compiler backend hacking generally, you'll end up working with assembler.
17:11imirkin_: the code makes it seem like there's a single line of execution
17:11imirkin_: but in reality it's a SIMD executor
17:11imirkin_: (actually several)
17:11imirkin_: which is why the differences come into play
17:12imirkin_: right now the scheduling info we're supplying in the DDX is "wait 15 cycles"... always
17:12jamm: imirkin_: ah, i see, so alongside the opcodes you also want to mention some sort of intentional delay or like a wall of *this is the max cycles you should do* or something
17:12imirkin_: whereas that's almost always unnecessary
17:12imirkin_: hakzsam spent a bunch of time implementing sched info for the mesa compiler backend
17:13imirkin_: which is obviously harder, since it has to work for all instruction sequences
17:13imirkin_: however those learnings need to be taken and applied to the fixed assembly code in the ddx
17:13imirkin_: which, for someone who knows this stuff off the top of their heads, should take 10 minutes. but if you first have to learn what the rules are, it should take a day or two i'm guessing.
17:14nyef: Yeah, this is sounding like a good "get your feet wet" project.
17:14hakzsam: st == 0x0 doesn't mean wait for 15 cycles, it means "wait for all previous deps" which can be more than 15 for memory operations, etc
17:14imirkin_: hakzsam: ah right.
17:14jamm: imirkin_: right, so say i figure out what delays i need to use. How should I test them?
17:14imirkin_: hakzsam: then why didn't it work for atomic? :p
17:15hakzsam: it should work
17:15imirkin_: jamm: read the scheduler code that hakzsam wrote in mesa, and apply those policies.
17:15jamm: nyef: yeah that's why it seemed good for me as i'm a total noob at the moment
17:15imirkin_: hakzsam: wasn't the whole reason for implementing this stuff to make the atomic test not kill the board?
17:15imirkin_: jamm: and then test empirically :)
17:15jamm: imirkin_: will do, you mentioned the cpp files a few days ago, i will take a look at them
17:16hakzsam: imirkin_: the main reason was perf of course, but I'm pretty sure that piglit test is wrong
17:16jamm: hakzsam: thanks for your hard work!
17:16imirkin_: hakzsam: so wrong that it kills the board? :p
17:16hakzsam: imirkin_: now it should no longer hang your box because skeggsb improved the recovery
17:16imirkin_: hakzsam: i thought with atomics and whatnot it didn't wait long enough, which is why we didn't want to enable images/compute
17:17hakzsam: it's the only test which fails...
17:17hakzsam: I tested a bunch of real apps with atomics, shared mem etc
17:17hakzsam: works fine AFAIK
17:18hakzsam: and the sched codes for that test looked good
17:18hakzsam: so, the issue is most likley something else :)
17:19imirkin_: i meant BEFORE you implemented sched codes
17:20imirkin_: we were holding back images/compute
17:20imirkin_: i thought it was due to correctness
17:20hakzsam: ah yeah, correct. It was yeah
17:21imirkin_: so your claim that (st 0x0) waits for everything seems inconsistent with that
17:21hakzsam: it's unrelated
17:21hakzsam: the test failed, and it still fails
17:21imirkin_: weren't there other tests that failed?
17:21hakzsam: with/without sched codes
17:21imirkin_: heh ok
17:21hakzsam: it was the only one
17:22imirkin_: oh well.
17:29imirkin_: airlied: did you have any luck figuring out that runpm issue with nouveau?
18:56karolherbst: does this c2 -> g conversion look right to you? https://gist.github.com/karolherbst/faa6520f8cf9f9e73a18d4b3d17f6b25
18:58karolherbst: mhh, mhh for me it looks fine, just want to make sure
18:58imirkin_: yes, seems right...
19:08karolherbst: it's done within NVC0LoweringPass::handleLDST it seems
19:08karolherbst: "// TODO - synchronize the max with the driver."
19:09karolherbst: but I guess this is unrelated.. or at least I hope
19:15karolherbst: imirkin_: what would happen if I just disable that part?
19:15imirkin_: major fail
19:15imirkin_: since the CBs aren't bound :)
19:16karolherbst: okay, where would I need to bind them?
19:20karolherbst: mhhh, interesting
19:21karolherbst: some of those OOR turned into MISALIGNED_ADDR now
19:23karolherbst: I assume this is caused by non bound CBs
19:24imirkin_: in nve4_compute.c
19:24airlied: imirkin_: didnt have much luxk reproducing it, cant see any obviouss revert either
19:25imirkin_: karolherbst: MISALIGNED_ADDR means you're doing a 64- or 128-bit fetch off of an unaligned address
19:25karolherbst: mhh okay
19:25imirkin_: airlied: hrm... i think just leaving the GPU alone for like 5 mins and then trying to use it should be enough
19:26imirkin_: airlied: did you try it on a box with 2 GPUs where you don't start X at all?
19:26imirkin_: my guess is in your setup, X grabs a reference to the GPU
19:26karolherbst: ohh that makes sense. something broke on reator as well
19:29karolherbst: imirkin_: do I just have to adjust res->cb_bindings in nve4_compute_validate_constbufs or is there more to it?
19:29imirkin_: karolherbst: something like that? i don't remember exactly.
19:29karolherbst: ohh wait, I think I see it now
19:29karolherbst: "// Only bind user uniforms and the driver constant buffer through the"... more comments and code
19:29karolherbst: seems like the place
19:39karolherbst: getting OORs again odd
19:49karolherbst: typo: "6 driver constbuts, at 2K each"
19:50karolherbst: uhm, isn 2k a little small?
19:50imirkin_: i dunno what this is
19:50imirkin_: i mean...
19:50imirkin_: the context
19:51imirkin_: the driver constbuf is 2K
19:51imirkin_: there's 6 of them, one for each shader stage (+ compute)
19:51karolherbst: well sure... but if there is like c7[0x4000] or so, isn't this like obviously oor as well?
19:52imirkin_: nope, it's bound such that the window is correct
19:52imirkin_: i.e. c7 points to the right place
20:52karolherbst: mhhh, well something has to be wrong...
21:05karolherbst: mhh, I am running out of random stuff to try
21:16karolherbst: imirkin_: where is the buffer of the global memory?
21:16karolherbst: or is global == VRAM?
21:16karolherbst: ohh I see
21:16imirkin_: which can map to VRAM or GART
21:16imirkin_: er, SYSMEM
21:17karolherbst: I know
21:17imirkin_: although it'll obviously be slow-as-molasses if it's SYSMEM
21:19karolherbst: maybe the OOR_ADDR error just means .... no that wouldn't make sense. I mean nothing can do usefull things with VM addresses outside the allocated regions.. or maybe the GPU just puts it somehere else? dunno It just doens't make sense to me, why there is no rendering error at all
21:20imirkin_: could be that there's no PTE backing the address
21:20karolherbst: might be
21:20imirkin_: or could be something else entirely
21:20imirkin_: like wtf is this
21:21karolherbst: no clue
21:21tobijk: imirkin_: 64 threads enablement ;-)
21:21karolherbst: I see
21:22karolherbst: Hyper-Q stuff most likely?
21:22karolherbst: or that other thing
21:22tobijk: karolherbst: that was pure specualtion
21:23karolherbst: I know, but there is something related to that
21:23karolherbst: dynamic parallelism maybe
21:23karolherbst: the gpu can launch new kernels itself
21:23tobijk: karolherbst: but you would have a different count on different cards 32-192 on that gernation ~
21:25karolherbst: imirkin_: any idea what that "NOUVEAU_NVE4_MP_TRAP_HANDLER" is for?
21:25imirkin_: for handling traps :)
21:25tobijk: and multi processing :>
21:25imirkin_: like when there's a trap
21:26imirkin_: it'll jump to that address
21:26imirkin_: and then you can handle the exception
21:26imirkin_: used for debugging and whatnot
21:26imirkin_: we don't support it :)
21:26karolherbst: sounds interesting
21:26imirkin_: it's not exactly straightforward
21:26karolherbst: yeah :D
21:26karolherbst: will enable it and see what that gets me
21:27karolherbst: compile errors... who would have thought
21:30imirkin_: it won't work.
21:59RSpliet: karolherbst: Do you happen to know which GPUs are in reator?
22:05karolherbst: I guess if nobody else has any ideas left regarding hte OOR_ADDR error I might finish all the other things first :/
22:06tobijk: i'd need more context
22:06karolherbst: tobijk: it's compute shader related
22:07RSpliet: karolherbst: that's a shame... I need a guinea pig :-P
22:07karolherbst: and most likely triggered by out of bound access inside c7 or g mem areas
22:08tobijk: mhm conditional jumps into other segments (too far from the originating jump location)? O.o
22:08tobijk: ok i have no clue
22:12tobijk: imirkin_: for the xf86-video-noueau enablement patch again: 2D works fine, no reverse prime, no 3D accel (firmware is available in /lib/modules/nvidia/...)
22:12tobijk: imirkin_: xorg log: https://hastebin.com/qavucuxete.sql
22:13imirkin_: [ 51.486] (**) Extension "XFree86-DGA" is disabled
22:13imirkin_: been a while since i've seen DGA
22:13RSpliet: mupuf: I'd be grateful if you could test the middle perflvl on my current kernel branch on your nvc4 and nvc8. No rush though, I'm calling it a night
22:13imirkin_: [ 52.083] (EE) NOUVEAU(0): Failed to initialise context object: 2D_NVC0 (0)
22:13imirkin_: tobijk: i don't believe you.
22:14imirkin_: either the firmware is not available, or it's not available when nouveau loads.
22:14imirkin_: tobijk: pastebin 'ls /lib/firmware/nvidia'
22:14tobijk: mh at least the folder was there
22:16tobijk_: imirkin_: https://hastebin.com/yanogufadi.coffeescript
22:16imirkin_: tobijk_: ok. so you have it. is it there when nouveau loads?
22:16imirkin_: e.g. if nouveau is a module and you use an initrd
22:16imirkin_: it probably must be in the initrd
22:17tobijk_: ah ok, then let me check :)
22:20tobijk_: imirkin_: yep its there, btw 3D accel works fine with intel as primary and nouveau as slave, e.g DRI_PRIME=1 app
22:21tobijk_: it does not work if nouveau is the primary one
22:22imirkin_: tobijk_: well, the ddx is coming up as noaccel
22:22imirkin_: tobijk_: this is with drm-next?
22:23imirkin_: tobijk_: pastebind mesg?
22:23tobijk_: it just thown me a nice oops for show
22:23imirkin_: oh, is it runpm-related?
22:23imirkin_: if so, boot with nouveau.runpm=0
22:24tobijk_: was the first time a saw it
22:25imirkin_: oh goodie
22:25imirkin_: anyways, runpm-related, but not the way i thought
22:25imirkin_: either way, try booting with nouveau.runpm=0
22:25imirkin_: gnurou: --^
22:25imirkin_: looks like the acr_r352_bootstrap code gets upset and does an ioread it's not supposed to.
22:27tobijk_: imirkin_: the dmesg is without vgaswitcheroo btw, so intel is primary
22:28imirkin_: tobijk_: anyways, the issue you're having is that the GR unit isn't coming back up after a runpm suspend. not the ddx's fault :)
22:28tobijk_: imirkin_: yep this time, but it was working often earlier, so bad luck (or lucky to show up now :D)
22:28imirkin_: tobijk_: actually if you don't mind, file a nouveau bug about it so it doesn't get lost, not sure what gnurou's availability to look at it immediately is.
22:29tobijk_: ke, have saved the dmesg will do it later
22:29imirkin_: to confirm though, with runpm=0, everything came up ok?
22:30tobijk_: it came up ok even without
22:30imirkin_: and you have 2d accel and 3d accel, and everything looks like it's rendering ok?
22:30imirkin_: yeah, but it came up without accel, which is cheating
22:30imirkin_: since that's just cpu rendering
22:30imirkin_: and not testing the code i wanted to test
22:30tobijk_: let me check with runpm=0
22:39RSpliet: skeggsb: in your transition to atomic modeset, you seem to have removed the code to calculate an estimate for vblank in usec on pre-GF110 (evo mthd 0x0810)
22:39tobijk_: imirkin_: looks ok now, glxinfo is showing up novueau
22:39imirkin_: tobijk_: pastebin new xorg log
22:39RSpliet: unhappy about that, as now clock changes flicker again on single monitor set-ups
22:40imirkin_: RSpliet: probably a mistake? :)
22:40tobijk_: imirkin_: https://hastebin.com/beqaqoyona.sql
22:41RSpliet: imirkin_: possibly, no need for me to speculate. Is he back btw?
22:41imirkin_: tobijk_: looks good. mind trying mplayer -vo xv some-video?
22:41imirkin_: RSpliet: don't think so. next week if my count is right.
22:42imirkin_: (he sure knows how to vacation)
22:45RSpliet: Oh I'll ask him next week then
22:51tobijk: imirkin_: eh well it seems to run fine on the first glance, but on the long run i get nice hangs and what looks like inverted colors
22:51imirkin_: ooooh, neat-o
22:51imirkin_: i like inverted colors
22:51imirkin_: can you check dmesg?
22:51tobijk: and i di not start mplayer yet
22:52imirkin_: does it complain about INVALID_DATA or something?
22:52tobijk: cant right now, but i'll try to retrigger and gather it over ssh next time
22:52imirkin_: already rebooted? :)
22:53tobijk: imirkin_: not able to switch to console
22:53tobijk: and ssh disabled :/
22:53imirkin_: why no switch to console?
22:53imirkin_: oh, coz it hung.
23:04tobijk: imirkin_: as long as it works, its fine, replaying a video right now (with vdpau if vlc is not lying to me)
23:04imirkin_: please use xv
23:04imirkin_: there's no vdpau
23:05imirkin_: or if there is, not hw decoded
23:05imirkin_: and it'll be more GL-like than xv-like
23:24tobijk: imirkin_: ok so finally with mplayer and xv its running fine
23:25tobijk: and the hang did not occur yet (as always if you want to capture something :D)
23:31tobijk: imirkin_: https://hastebin.com/fozawodaso.css <-- thats all i get now when it hangs, seems to be qt related? :>