01:38prg318: Hi! I have a GTX 1060 and gave nouveau a shot today by completely uninstalling the nvidia blob and booting with the nouveau module. However, when I booted my tty on my 4k display was flickering. It would blank every couple seconds or so, and I got a bunch of 'link rate unsupported by sink' lines in my dmesg. Here is more of dmesg: https://xannode.com/dmesg.nouv.txt . The display is a 4K display FWIW. I
01:38prg318: launched sway and the flickering seemed to stop - it was only on the tty
01:39prg318: Has anyone seen anything like this before? I tested with 4.13.7 and 4.14rc5 (on archlinux64) and would be happy to provide more information if anyone has any ideas
01:40gnarface: prg318: sounds kinda like something i saw once
01:41gnarface: prg318: you using analog cables?
01:41prg318: I'm using displayport
01:41prg318: The setup works fine with windows and xorg/nvidia-blob
01:42prg318: well actually i can't say that - with nvidia blob, tty is low-res (i don't have kms setup for nvidia; i think its possible now though)
01:42prg318: but it can drive the monitors in X
01:44gnarface: hmm, may be unrelated
01:44gnarface: and i happen to know that the 1000 series cards are really fraught on nouveau right now
01:45gnarface: i'm trying to remember the kernel command-line option though that fixed it for me on an older card, so you can try it anyway
01:45gnarface: hang on i think i have to boot a machine to read it, stand by
01:47imirkin: ooh! fun!
01:47imirkin: [ 7.578995] nouveau 0000:01:00.0: disp: 0x00006465: INIT_GENERIC_CONDITON: unknown 0x07
01:47imirkin: [ 8.057232] nouveau 0000:01:00.0: disp: outp 01:0006:0f84: link rate unsupported by sink
01:48imirkin: prg318: would it be possible for you to grab a copy of your vbios?
01:48imirkin: prg318: cp /sys/kernel/debug/dri/0/vbios.rom /tmp/vbios.rom
01:48imirkin: and upload it to, e.g., filebin.ca ?
01:50prg318: I don't see that file right now, but I'm also currently booted with nvidia blob. Do I need to be using nouveau to see this vbios.rom ?
01:51imirkin: prg318: also, the edid from your monitor couldn't hurt - e.g. base64 /sys/class/drm/card0-DP-1/edid
01:51gnarface: prg318: sorry, it wasn't a kernel cmdline option i was thinking of. it was actually a module option; options drm_kms_helper poll=0
01:52gnarface: prg318: (just so you can try it later if nothing else works, but i'm thinking maybe your issue only sounds similar, and isn't actually related. imirkin knows stuff though. do what he says first)
01:53imirkin: we might be reading the EDID wrong - there have been reports of fail after switching to the generic DP code
01:53imirkin: there was a bug with address-only transactions not being supported, but the 4.14rc5 kernel should def have that fixed
01:53imirkin: perhaps the 4.13.7 one as well, not sure
01:54imirkin: skeggsb, who does not appear to be here is the real expert on all this modesetting stuff
01:56prg318: actually I mispoke earlier - I need to try 4.14rc5 again; I was having a seperate issue when I tried the rc5 kernel. But first, I'll boot with nouveau and upload the vbios and edid. I have two monitors connected FWIW (an ultrawide and a 4k - its selecting the 4k for tty)
01:56imirkin: well, grab the EDID for both of them, just in case
01:56prg318: yeah; got em both. thanks
01:57imirkin: it'd be preferable if you can file a bug about this at bugs.freedesktop.org including all that info
01:57imirkin: that way there's an easy way to reach you
01:57prg318: i was originally trying to test the ultrawide for wayland performance and had run into this. this may have been the first time testing nouveau on this card/monitor setup (shame on me)
01:58prg318: Sure I can do that
01:59imirkin: xorg -> Driver/nouveau
02:00imirkin: it's complaining about ELD fail, so i wonder if something's getting corrupted
02:00imirkin: or if DP-audio needs something slightly different than HDMI-audio
02:00imirkin: [and it seems eminently likely that we don't have a way of directing the audio to a particular sink... dunno how that's meant to work]
02:03prg318: So this is interesting.. This time it decided to use the ultrawide for tty, but it's having the same flickering problem (but with some additional artifacting problems)
02:03imirkin: the VBIOS includes something we don't know how to process properly
02:03imirkin: my guess is it's not there just for the sheer joy of it
02:03imirkin: GENERIC_CONDITION used to be called DP_CONDITION originally
02:04imirkin: so ... have to decode your vbios to see wtf condition 7 means
02:04imirkin: since it would kill nvidia to provide that sort of info.
02:04prg318: shame D:
02:05imirkin: perhaps ben has already seen this one. dunno.
02:05imirkin: he's really the modesetting master
02:21imirkin: prg318: excellent, thanks a lot!
02:22prg318: no problem! thanks for helping me compile this information
02:22imirkin: mind uploading the vbios.rom as well?
02:22imirkin: i know you have a link to it, but that can go stale
02:22prg318: what's a good upload place ? that's my personal file server it should stay up but i have no problem mirroring it
02:23imirkin: it's a small file - right into bugzilla is fine
02:23prg318: ah perfect
02:23imirkin: also - are the monitors each plugged directly into the GPU, or are they daisy-chained?
02:23prg318: they are plugged in directly
02:23imirkin: ok cool
02:24imirkin: ok, then one more thing if it's not too much -
02:24imirkin: boot with "nouveau.debug=trace drm.debug=0x1e" and upload that dmesg as well
02:24prg318: Sure; I'll do that next
02:25imirkin: it'll be a lot - might have to increase your log_buf_len or osmething
02:27prg318: good thinking
02:28prg318: 4M should prob be fine?
02:31imirkin: interesting. edid-decode hates both of those EDID's
02:31imirkin: Warning: One or more of the timings is out of range of the Monitor Ranges:
02:39prg318: This is probably relevant (didn't mention in ticket yet). When I run with the nvidia blob, I'll occasionally get flickering and artifacting on the Ultrawide monitor in X.org. To workaround this, I toggle the monitor off and back on with xrandr and that usually fixes it. Never had any issues in Windows with the monitor
02:39imirkin: prg318: the monitor seems to claim to be getting constantly plugged/unplugged
02:40imirkin: or at least that's what we're detecting
02:40imirkin: seems to be going from HPD 1 to 2. not sure what that means.
02:41prg318: that's really strange. I've switched out the cables and video card outputs when I initially saw the weirdness (when I introduced the ultrawide mointor) but I got the same behavior
02:41imirkin: i'm sure it's nothing to do with the hw
02:41prg318: I'm just testing with nouveau today for the first time with this setup
02:41imirkin: we're just misdetecting something
02:42imirkin: but i think that's why you keep getting the flicker
02:42imirkin: we think the monitor's disconnected, turn it off, then we see it's there, turn it on, etc
02:42imirkin: can you try with one monitor?
02:43prg318: yeah sure
02:43imirkin: [unplug the cable... i don't think power off is enough]
02:43prg318: will do
02:44imirkin: [i mean the DP cable]
02:44imirkin: not sure about DP, but at least with VGA the GPU actually powers the PROM machinery on the other end so it can be read out.
02:45prg318: gotcha; gonna give it a go - once with each monitor exclusively plugged in via dp
02:49prg318: So TTY has no flicker and sway seems to run okay when JUST the samsung 4k is plugged in
02:51imirkin: well, 2x giant screens is probably a lot to push on lowest clocks even with a beast like a GTX 1060... dunno if that matters.
02:51prg318: When I boot with just the ultrawide, I'm getting just a black video signal
02:52prg318: sorry - i'm receiving no signal
02:52imirkin: that was more of a "weird" than a "i don't understand what you're saying"
02:52prg318: i don't get anything after the bootloader
02:53imirkin: well that's odd - should be some time after the bootloader and before nouveau loads - do you see stuff then?
02:53imirkin: [would be for a brief period of time]
02:53prg318: I get a few lines yeah
02:53prg318: I actually breifly see something about Firmware Bug
02:54imirkin: i guess this is the future and things boot fast now
02:54imirkin: that's some ACPI thing
02:54imirkin: and most likely unrelated to nouveau
02:55imirkin: this will have to wait for Ben to make any significant progress - i wouldn't really know where to start
02:56imirkin: he's generally around during the week, AU time.
02:56prg318: I connect through a bouncer so I'll be in channel
02:56prg318: I'm going to update the bugzilla ticket with dmesg from the runs with just one monitor plugged in
03:03prg318: Thanks again for helping me getting this information together !
03:05imirkin: sorry can't be of actual help =/
03:05imirkin: knowledge for this stuff is unfortunately a bit concentrated
03:07prg318: yeah I can imagine.. and no apologies necessary ! i appreciate all the effort and help. plus, this isn't a "blocking" issue for me - I use nvidia blob for day to day, but ran into this issue trying to do some wayland performance testing for a friend
03:08prg318: but there has definitely been 'something' slightly amiss even with the nvidia blob when this ultrawide is connected (as i mentinoed earlier; i will get artifacts sometimes until i toggle off/on the ultrawide with blob)
03:10imirkin: well hopefully ben will either see the issue or be able to provide you with more detailed debugging steps
03:11imirkin: he's pretty busy, so you may have to grab his attention actively
03:11prg318: what's his irc nick?
13:23albamart: anyways though both nvidia and blob do hide the latency per instruction, as said it would not adjust the latency instruction only 1) if memory instruction misses the cache on arbitrary lane, meaning next lanes will be serialized 2) if fast instruction uses a constant and misses the constant on arbitrary lane, same thing as before, the upcoming lanes will be serialized
13:26albamart: what means hiding the memory latency in simd sense is described there, what happens is that CU can schedule upto 20 instruction of different types, 5per each SIMD, but one needs to have each instruction originating from different waves/warps to be eligble for scheduling
13:27albamart: in other words , that blocking memory instruction is removed from that list
13:28albamart: and will be come back at, when the fetch is toggled to be ready, but once it misses the cache, it's latency can be adjusted to be memory fetch instruction, which is where i am making a little of code
13:30albamart: in other words, i will decompose the warp into two different regions, and redirect the upcoming lanes back to the regs of the non-missed ones to be reused
13:31albamart: effectively freeing up number of regs for the next instruction, which can be arbitrarily used later , also the threads itself can do runahead decode and stuff
13:31albamart: and execution sense, threads migrated away can become partial and scheduled to the same SIMD that had lanes serialized
13:34albamart: migration means that when in some arbitrary programs counter instruction aliases the tids
13:35albamart: that means after this instruction (where wise would be to use fast methods for migration)
13:35albamart: the program counter for that decomposed wave is permanently changed again to reflect the new pc
14:49imirkin: skeggsb: why do we have 2x dcb parsers? is it coz they're so much fun to write?
14:49imirkin: should i add a third?
14:50imirkin: or, on a more serious note, would it be beneficial to try to do some refactoring/deletion in nouveau_bios.c
15:06imirkin: ccaione: did you ever figure out the GP106 backlight thing?
15:07ccaione: imirkin: uhm, nope ... not spent much time on that anymore TBH
15:49imirkin: ccaione: ok thanks
16:54MaximLevitsky: I have small question about NV40
16:54MaximLevitsky: not that related to nouveau but you might know anyway
16:55MaximLevitsky: If the card is in D3 state, would it be able to decode any io/mem ranges?
16:55MaximLevitsky: I am not yet using nouveau I must admit, and my system has two nvidia cards which I pass through to VMs
16:56MaximLevitsky: One is NV102, and other is that NV40 which I put in there to play old XP games
16:57MaximLevitsky: I sometimes get total system hang, and I am debugging about what is going on - I thought that maybe both cards wanted to decode the legacy VGA ranges (or one card and host GPU)
16:57MaximLevitsky: I don't use both nvidia cards at the same time so the other card is always in D3
17:28marmistrz: pmoreau, ping
17:44imirkin: MaximLevitsky: legacy VGA range decoding is done by the PCIe bus, which forwards those requests to whichever board it's currently set to, i believe. this is controlled by vgaarb on linux.
17:44imirkin: no clue what happens in a VM passthrough situation
17:44imirkin: also, fyi, NV102 isn't a thing
17:45imirkin: perhaps you mean NV132 aka GP102?
18:19rhyskidd: imirkin: Related to "INIT_GENERIC_CONDITON: unknown 0x07", was this that I looked at back in January: https://github.com/envytools/envytools/pull/75
18:19rhyskidd: might be what you were trying to recall
18:23albamart: anyhow, i was thinking, if the hw does reschedule the instruction once the cache fills in from memory and if i migrate the idling threads and finished ones i.e non-idling awat i.e all the warp to next instruction, can i possibly my own reschedule the idling ones into finished ones in the same warp
18:29imirkin: rhyskidd: ah right. then it's not important.
18:30rhyskidd: looking at the new bugs filed in last 24hrs, it's just sad that there's so much breakage on GTX 10xx right now
18:31rhyskidd: laptops with those chips must be getting popular
18:41albamart: when the theory holds , that you can change the registers of allready executed ALU instructions with an absolute address, then you'd be able to repoint the lane1 , one after another to cache entries the memory being fetched to cache that is
18:44albamart: like so, first cycles of 400 lane1 has no pointer, following cycles of 400 the step2 , it has contents of the cache of lane2 , then lane3 then lane4-64
19:48albamart: but it's a bit akward, i'd have to check my notes, what they mean by dispatch conflict, is that a conflict when the scheduling register is not empty, and what the hw would do in such case
19:49albamart: operand collector patent by nvidia was where it was said
20:02albamart: nothing good came out of there, i always hated how nvidia does the things, to maintain high throughput when there is a dispatch conflict on some lane, it can switch execution unit's i.e functional units, so when programmer wants to not allocate an alu for particular lane, by pointing a value on the absolute scheduling reg, the fucker still gives it an ALU in hw
20:11albamart: there is however a way around that
20:14albamart: it says when there is a constant or data cache miss, it will be put off from the execution units, however does not specifically mention if that applies to dispatch conflicts too
20:15albamart: in other words, you alias a constant to the scheduling reg
20:26albamart: i find the patent being not too specific for the most important part
20:28skeggsb: imirkin: no good reason, just that time hasn't been taken to fully kill nouveau_bios.c, very slowly making it possible to get there... i had a branch somewhere once, maybe still on cgit or something, working towards that end
20:32imirkin: skeggsb: should i try to remove it? [it = nouveau_bios.c]
20:33imirkin: skeggsb: i don't really want to duplicate work you've done, or do something you're going to redo anyways
20:35skeggsb: not sure anymore of what state it's in, or how correct it is
20:35skeggsb: but yeah, i wouldn't complain if it disappeared :P
20:36albamart: oi oi, i read it friggin terribly wrong, NVIDIA does it especially well
20:36albamart: however AMD does it good too
20:37albamart: there thread does not mean a warp , but a single work-item in opencl sense
20:38albamart: because at the top of the patent there is also a term called thread group
20:39albamart: hell i apologise some of the content i just posted was hence incorrect
20:39albamart: i have always read that patent carelessly and wrongly, now that i see, it is perfect what the hw does
20:42albamart: another hand, the day where i pulled slight crap needed to come after a long time still :D
20:42albamart: it is just i managed to do it the whole day again, was only slightly drinking, and it seemed to have help this time
20:47albamart: so NVIDIA has CCTL inst. this can be used to detect the misses, but also i have the procedures of fast kind to do it without this inst.
20:47albamart: since AMD does not have it
20:49albamart: i try to sleep now, i had basically when i got too intense enormous tooth ake yestirday , as things look good enough, i will try to sleep now
20:49imirkin: skeggsb: ok cool. what about the fermi clk stuff?
20:49albamart: as the rescheduling granularity is thread based which i missed earlier, this is hence very sane by NVIDIA hw too
20:56albamart: so right, this is what i wanted to see actually, so the scheduler can be slightly enhanced for NVIDIA too, to point the lanes as i said to free up partial warps, i read it entirely through it does not do it yet on cache miss, id adjusts the latency though, hmm
20:57albamart: ai damn, that is the purpose of operand collector pool, it gives whatever what is free at the time
20:58albamart: so nvidias stuff is best the way it is, so i have pulled slight crap before too it seems
20:59albamart: it would lead to instruction cache trashing but probably it can be controlled in sw too, as i can see now
21:04albamart: though this does not add up still -- the french guy, can't remember the name offhand now, the one who implemented sched opcodes, said his benched perf gains were in the middle of 275 percentage, which is 5fold max, from around 100percentage minimal, but david tarjan got 4fold only with memory instructions consistently
21:07albamart: ah right, i confuse my own shit too, i had a plan for this, yeah the stats look close enough
21:08albamart: cause the performance gain was said by david also to be limited by cache thrashing stuff, ok ok, fine looks good
21:11albamart: anyhow , bye, there could be a hack to get this threashing threshold to be lifted, but i arrive to there slightly later after a month with details how to do it, first i do the similar thing in AMD cards and every other cards, sw based different kind of method with masks
21:12albamart: then i may end up doing some correction theories for NVIDIA -- probably i care too little there but...bye
21:31albamart: oops allready back for a sec, there are major hacks yet to be unleashed, there is still so much on the table, and for power saving too, that ...
21:32albamart: this is just crazy, but luckily the hw base is very sanely done, and provides all the foundation to do that
21:33albamart: hacks in performance boost world, major major ones to be fair, and as said it is also possible to lower the power usage
21:36albamart: i feel better now to see those nvidia cards be fawless too in hw, after now that finally managed to correctly read the most important patent. Why the hell do i care at all i dunno, just an endless jabbering --once started the snowball grows .. slightly disgusted about myself too allready, also bored about this all
21:43albamart: in fact i still have some nvidia low end card, but actually no PC anymore to put it into, cause i managed to sell my haswell box
21:44albamart: only two APU's one friendly gift i got , nice portable thinpc and my laptop