00:04karolherbst: mhh I think the ram is broken in my fermi laptop :/
00:11karolherbst: now that's funny, even with different RAM (which works great in my laptop usually) the laptop won't start right :/
00:13karolherbst: guess what, the AC adapter wasn't plugged in
00:32mwk: imirkin: so, fmac works as follows
00:33mwk: if the third operand is +-0, it's a fmul.rn; therwise it's fmul.rz followed by fadd.rn
00:34mwk: except that the intemediate result has wider exponent range and cannot overflow to inf / underflow to 0 until after the add
00:34mwk: I guess they only have onepiece of round-to-nearest circuitry...
00:35mwk: I'm still getting some errors, but they're all related to +-0 and underflows
00:41karolherbst: weird stuff
00:55karolherbst: mwk: did you check if the hw implementation works according to the IEEE 754-2008 fmac thing?
00:55mwk: karolherbst: do you mean fma?
00:56karolherbst: should be, right? or what do you mean by fmac?
00:56karolherbst: ohh wait
00:56mwk: floating-point multiply-accumulate
00:56karolherbst: I mean fmac
00:57karolherbst: ahh right, the IEEE thingy doesn't contain fmac :/
00:57mwk: as opposed to fma, which would be ffma, floating-point fused-multiply add
00:57mwk: which this thing most definitely isn't
00:57imirkin: also G80 was released in 2008, so probably not complying to any standards released that year
00:58mwk: fma is older than that though
00:58karolherbst: mwk: no, I meant fmac, I just thought the 2008 thing also describes fmac
00:58mwk: and G80 is a 2006 child
00:59karolherbst: mwk: but I read that the IEEE fma states that it is required that the hw only does one rounding
00:59karolherbst: two roundings aren't allowed
00:59karolherbst: so I was thinking the same could apply to fmac
00:59mwk: fmac does two roundings
01:00karolherbst: mhh, k
01:01karolherbst: then it's not fmac
01:01karolherbst: or not directly fmac :/
01:02mwk: karolherbst: look, you got a reference for that "fmac" thing?
01:02karolherbst: I am searching, but everything tells me that fma and fmac is the same
01:02mwk: because the only IEEE operation I can think of is fma
01:02mwk: and I've ruled that out several years ago
01:05karolherbst: mwk: from the IEEE spec: "The idea behind a Multiply-Add ( or “ MAC ” for “ Multiply-Accumulate ” ) instruction is
01:06karolherbst: they also mostly say FMAC
01:07mwk: ah, annoying
01:07mwk: I'll have to pick a different name then
01:07karolherbst: ohh k
01:07karolherbst: also the IEEE states one rounding
01:07karolherbst: not two
01:08karolherbst: so this fmac you found would be a non-compliant fmac then?
01:08karolherbst: would be still good to know where it complies and where not :D
01:09mwk: karolherbst: told you, it's more or less a normal rounding mul followed by rounding add
01:09karolherbst: mhh okay
01:09mwk: with a few surprises
01:10mwk: first one is, if third operand is 0, the mul rounds to nearest; otherwise, it rounds to zero (leaving the round to nearest circuit available for add, I suppose)
01:11mwk: second one, the intermediate result has 24 bits of precision like a normal float, but the full range of exponents, so you don't get overflows/underflows until the addition
01:11karolherbst: so, fma 0.5 0.7 3 is what?
01:11karolherbst: ohh 0.7 is badly choosen :D
01:15mwk: third one, the normalization step in add has to stop once you hit an output exponent of 1, once you hit it, go directly to the rounding step
01:16mwk: if you normalize until normalized, then round, you'll get a different result
01:17mwk: (it doesn't matter for normal add since once you get below exponent 0, you get to denormals, and rounding doesn't change anything)
01:17mwk: ok, so now the only problem I have is getting the sign of 0s on output correct
01:20karolherbst: what was cuda sm 1 again?
01:20karolherbst: because there is some fma docs in the cuda stuff for sm_1x
01:21karolherbst: not much though
01:22karolherbst: mwk: "fma.f32 requires sm_20 or higher." and "fma.f64 requires sm_13 or higher."
01:22karolherbst: somehow that sounds strange
01:22mwk: there's a simple explanation though
01:22mwk: they made the G80 without a fma
01:23karolherbst: ahh k
01:23mwk: but when they made the fp64 unit for G200, they started anew and stuffed a fma in there
01:23mwk: but didn't want to touch the fp32 one
01:24karolherbst: I see
01:25imirkin: mwk: btw, if you got a clever way of emulating rcp/rsq f64 with the g200 ops, let me know
01:25karolherbst: mwk: how does that sound to you? "computes the product of a and b at double precision, and then the mantissa is truncated to 23 bits, but the exponent is preserved."
01:26mwk: sounds like what I'm seeing
01:26mwk: ... except when second operand is 0
01:26mwk: er, third
01:26karolherbst: "mad.f32 computes the product of a and b at double precision, and then the mantissa is truncated to 23 bits, but the exponent is preserved. Note that this is different from computing the product with mul, where the mantissa can be rounded and the exponent will be clamped. The exception for mad.f32 is when c = +/-0.0, mad.f32 is identical to the result computed using separate mul and add instructions. When JIT-compiled for SM 2.0 devices, mad.f32 is
01:26karolherbst: implemented as a fused multiply-add (i.e., fma.rn.ftz.f32). In this case, mad.f32 can produce slightly different numeric results and backward compatibility is not guaranteed in this case."
01:26karolherbst: maybe this helps?
01:27karolherbst: this is for sm_1x:
01:30mwk: karolherbst: that's a good one, actually
01:30karolherbst: cuda docs for the rescue! :D
01:30mwk: I have no idea what exactly went wrong, but I handled 0 results wrongly when third op was 0
01:32mwk: actually, let's clean that function up
01:32mwk: I'll just delegate to mul-add whenever I get *any* nan, inf, or 0
01:34karolherbst: is there somehwere a good mapping of cuda sm to chipsets?
01:34mwk: karolherbst: I'm not sure if it's written down anywhere
01:34karolherbst: imirkin: sure the g200 can't do rcp?
01:34karolherbst: ohh f64
01:35mwk: 1x is Tesla, 2x is Fermi, 30 is GK10x, 35 is GK20x, 50 is Maxwell
01:35karolherbst: imirkin: the cuda doc says there should be rcp.rn.f64 on sm_13
01:35mwk: for Tesla, 10 is G80; 11 is G84, G86, G9x, 12 is G200, 13 is GT21x and MCPxx
01:35mwk: karolherbst: there's not
01:35hakzsam: https://en.wikipedia.org/wiki/CUDA (Supported GPUs table)
01:36karolherbst: hakzsam: thanks :)
01:36mwk: PTX docs are for a lie, for a big part
01:36karolherbst: ohh k
01:36karolherbst: because of emulation in software?
01:36mwk: lots of the instructions are just expanded to long sequences
01:36mwk: and by long, I mean they involve calls to 200LOC functions
01:36mwk: rcp.rn.f64 is one of them
01:37mwk: which is why I stopped looking at that doc a long time ago, even though it happens to have an actually useful description of mad.f32
01:37karolherbst: rcp was 1/x right?
01:38karolherbst: imirkin: rsq: https://en.wikipedia.org/wiki/Fast_inverse_square_root
01:39mwk: karolherbst: yeah, one of my favorite algorithms...
01:39karolherbst: best one there is
01:43mwk: I cannot just defer to add-mul for all cases, big*big - inf is different..
01:47karolherbst: mwk: maybe if the difference isn't that big, there could be a variable like NV50_FAST_ARITH, which reduces accuracy, but improves performance by using instructions like that?
01:48mwk: karolherbst: that instruction is *more* precise than add / mul combination
01:48karolherbst: yeah but you said big*big-inf is odd
01:49mwk: compute it separately, and big*big overflows to inf, and inf-inf is NaN
01:49mwk: use mad, and it immediately computes -inf
01:49karolherbst: ohh right
01:52karolherbst: is this a big problem though? :/
01:54mwk: there are numeric algorithms out there where every difference from IEEE is a problem
01:54mwk: that said, the big*big-inf issue is not one I'd be concerned about
01:55mwk: the unreliable rounding mode for the mul is the bigger problem
01:56mwk: the test passes, let's spice it up
01:57mwk: yep, still passing with modifiers
02:00mwk: mad... for once, a fitting name
02:37KhazAkar: Hi all!
02:39KhazAkar: I need to type "nvapeek 110100"?
02:43KhazAkar: Btw - I can't use it -> "WARN: Can't probe 0000:01:00.1" "PCI Init failure!"
02:43mwk: KhazAkar: sudo works wonders
02:44KhazAkar: Ok,thanks. "00101000: 8f44ac1c"
02:44karolherbst: thank you
02:46KhazAkar: No problemo :P
02:46KhazAkar: If something more is needed,just ask
02:48KhazAkar: And how about my lid problem? On blob It doesn't exist
02:48karolherbst: what lid problem?
02:49KhazAkar: I close a lid on ~15 sec,then open it and screen don't Wake up. I need to fast close and open it again to have screen on
02:50karolherbst: KhazAkar: did you check dmesg after it stays black?
02:50KhazAkar: Not yet,wait,I need to remove blob and reboot notebook
03:01KhazAkar: Checked dmesg - nothing
03:03karolherbst: KhazAkar: can you increase the brightnes through /sys/class/backlight ?
03:04karolherbst: I don't know how the nouveau provider is called, but there should be a subdirectory
03:04KhazAkar: I have there acpi_video0 and nv_backlight folders
03:04karolherbst: k, then check nv_backlight
03:04karolherbst: and check the actual_brightness
03:05KhazAkar: I cannot do echo on actual brightness
03:06karolherbst: and you have to echo into brightness
03:06karolherbst: cat max_brightness > brightness
03:06karolherbst: as root
03:06KhazAkar: Now actual brightness is 100
03:07KhazAkar: But nothing changed on the screen
03:07karolherbst: mhh k
03:07karolherbst: maybe acpi_video0 then
03:07KhazAkar: Under acpi_video0 it changes
03:07karolherbst: so the screen is back on?
03:08KhazAkar: I have now screen back on,Then I make changes Under acpi_video0
03:08KhazAkar: If Needed I can attach second screen then make it
03:08karolherbst: I think you need to boot with "acpi_backlight=native" then
03:09karolherbst: or acpi_backlight=video
03:09karolherbst: not sure though
03:09karolherbst: I think it is native though
03:09karolherbst: KhazAkar: so then reboot with acpi_backlight=native
03:10karolherbst: and see if that works without tinkering inside sysfs
03:10karolherbst: KhazAkar: you should have also brightness keys on your laptop
03:10karolherbst: fn+ something, right?
03:11KhazAkar: yup,fn + Arrow up/down
03:11karolherbst: and I suspect they didn't work yet?
03:11KhazAkar: They worked
03:11karolherbst: ohh k
03:11karolherbst: then the problem might be somehwere else actually
03:11KhazAkar: Because they're controlled via Dell_wmi module
03:12karolherbst: so on a hardware level
03:12karolherbst: mhh k
03:12karolherbst: because sometimes userspace has to fetch those keys
03:12karolherbst: and then it depends on what acpi_backlight thing is selected
03:12karolherbst: I have for example intel_backlight, which would be nv_backlight for you
03:12karolherbst: but that doesn't work
03:13karolherbst: which would be acpi_backlight=vendor
03:13karolherbst: or was vendor the wmi thing?
03:13KhazAkar: I needed to bind Keys depended to control audio :p
03:13KhazAkar: Because Debian do it right,siduction no :D
03:14KhazAkar: With native Control don't work :D
03:14karolherbst: those keys always causes some sort of troubles
03:14karolherbst: I am happy that all important ones are working here
03:15KhazAkar: So acpi_backlight=vendor?
03:15KhazAkar: Native = not working Control,even with keys
03:16karolherbst: then vendor
03:16KhazAkar: I need to add acpi_osi line?
03:16karolherbst: that acpi_osi line is bad hack essentially
03:17KhazAkar: On some notebooks acpi_osi='Windows 2012' can help :p
03:17karolherbst: yeah well
03:17karolherbst: it still is a bad hack
03:18KhazAkar: Sometimes It's needed to have Linux working properly :)
03:18RSpliet: karolherbst: the bigger issue is that ACPI is a bad hack
03:19karolherbst: KhazAkar: yeah, because the firmware checks the kernel what it supports, and if something gos wrong the firmware messes up for no apperant reason
03:19RSpliet:tends to approach ACPI with extreme pragmatism
03:19karolherbst: RSpliet: and I thought ACPI should make everything so easy cause of unified API :D
03:19karolherbst: well "API"
03:20KhazAkar: With acpi_backlight=vendor worked like worked without that line :D
03:21KhazAkar: About changing brightness of course
03:21RSpliet: karolherbst: ACPI was designed anticipating the rise of bigger idiots
03:21RSpliet: *not anticipating
03:21KhazAkar: But It changed nothing about my lid problem, eh
03:22karolherbst: things like that are just stupid
03:22karolherbst: most ACPI issues only occur on some distributions because of some insane config
03:22KhazAkar: I always have hard to solve problems,eh :/
03:22karolherbst: KhazAkar: anyway, that's no nouveau problem
03:22RSpliet: there's BIOSes with faulty tables for Linux, KhazAkar: try all that acpi_osi stuff anyway
03:23karolherbst: RSpliet: yeah, but the backlight controls are working
03:23karolherbst: maybe the lid opened event doesn'T arive though?
03:23RSpliet: throw everything at it you have, and see what sticks
03:23KhazAkar: karolherbst So why with blob it works flawlessly?
03:23karolherbst: KhazAkar: maybe because there is no nv_backlight thingy ;)
03:23karolherbst: no idea though
03:23karolherbst: such issues are just messy
03:23RSpliet: karolherbst: nouveau has backlight control when necessary, but don't ask me about the specifics
03:24RSpliet: (esp. on the bit where it needs to determine when it's "necessary")
03:24karolherbst: RSpliet: do you know at which clock the gpu needs to have a different config for memory clocks?
03:24karolherbst: RSpliet: like 2400MHz on gddr5 kepler where you start using the second pll
03:24KhazAkar: With nouveau - after open lid no screen. With nlob - works like a charm :p
03:24RSpliet: karolherbst: the first PLL on my board can only be configured to 300 or 600MHz, so I guess the answer is "always"
03:25karolherbst: okay, so above 600MHz you have to do something else in addition?
03:25karolherbst: or below 300
03:25RSpliet: khazakar: does the machine suspend when you close the lid? does it wake up properly? got logs?
03:26RSpliet: karolherbst: no, always
03:26RSpliet: you can't make a 400MHz clock with one PLL
03:26RSpliet: it's 300, or 600... check your VBIOS
03:26karolherbst: ohhh k
03:26RSpliet: (if it's GDDR5 that is, my sample size is 1)
03:26KhazAkar: Dmesg clean,not suspend
03:26karolherbst: nah mine is ddr3
03:26RSpliet: check anyway, could be informative ;-)
03:27karolherbst: RSpliet: you know what, I will clock down in 100MHz steps through coolbits :D
03:27karolherbst: I did it a bit and only _one_ line changes in the SEQ script...
03:27RSpliet: gl hf
03:27RSpliet: check the resulting clocks too for a laugh
03:27karolherbst: well, PMPLL.MCLK0_COEF changes
03:28KhazAkar: I think its nouveau thing,because after connecting second screen "no signal"
03:28RSpliet: KhazAkar: if you are running a 4.4 kernel, please file a bug for that over at the freedesktop bugzilla
03:29RSpliet: our expert for that I'm afraid is Australian, so you'll have a hard time catching his attention on IRC ;-)
03:29KhazAkar: It's like hardware switching off a card
03:30KhazAkar: After fast close/open lid it works,but after it the "screen" apo appeares
03:31KhazAkar: On dmesg only spam with unknown wmi event.
03:33RSpliet: well, there's an important hint
03:33karolherbst: KhazAkar: you might want to report those wmi events upstream
03:34karolherbst: could be important ;)
03:34karolherbst: I bet these are those lid events
03:34KhazAkar: Sure,but it doesn't appear when I'm using blob driver ;_;
03:36KhazAkar: It's like missing communicating
03:36RSpliet: KhazAkar: whether it's a dell_wmi issue, a nouveau issue or a blob workaround of a broken ACPI we don't know, either way we need different eyes on it than karolherbst and me, because we're not the expert
03:36RSpliet: so file a bug!
03:37karolherbst: RSpliet: any idea how high ddr3 can be clocked?
03:37RSpliet: no idea... 1GHz?
03:38karolherbst: mhh nvidia-settings displays doubled clocks?
03:38karolherbst: cause I am on 2100MHz
03:39mupuf: has anyone tried to compile mesa with libdrm-git?
03:39karolherbst: RSpliet: but yeah, 1GHz seems about right somewhat
03:40karolherbst: maybe a bit more or less
03:41karolherbst: KhazAkar: any you should state that suspend also doesn't work through closing lid
03:41karolherbst: that's like the important part ;)
03:43KhazAkar: It's like half suspend :D
03:44karolherbst: well as long as ssh works, it isn't ;)
03:44mwk: sooo... quadop and cvt
03:44KhazAkar: It's locking the screen,not suspend :p
03:44karolherbst: that's weird
03:45karolherbst: that means the lid closed events arives somehow
03:45karolherbst: well no clue then
03:45KhazAkar: My notebooks are always "magic" under Linux
03:47KhazAkar: Suspend makes more magic when external screen are connected :D
03:49karolherbst: RSpliet: different configuration of ddr3 reclocking needed somewhere between 900 and 1000
03:49karolherbst: ohh wait
03:49karolherbst: there is other stuff too
03:49karolherbst: wow, how nice
03:51karolherbst: ahh, k
03:59RSpliet: mupuf: I take it this means "no"
04:00mupuf: RSpliet: hehe
04:01mupuf: I must be doing something wrong
04:01mupuf: Will have a look later
04:06karolherbst: does it makes sense, that mem training is only needed when the timings are changing?
04:06karolherbst: but it looks that way :/
04:07karolherbst: 810 -> 900 MHz, no changed timings, no link training
04:07RSpliet: for GT21x, memory training is done once, and once only
04:07RSpliet: check what nouveau does there
04:07karolherbst: 900 -> 1000 MHz, changed timings, link training
04:07karolherbst: RSpliet: this is mem training right? https://gist.github.com/karolherbst/6280616f9678b98dcd57
04:07karolherbst: what is it then?
04:08RSpliet: no idea, but check the GDDR5 code for more details
04:08karolherbst: k, first I will analyze my trace though
04:18karolherbst: those waits are also important, right?
04:56codehotter: OK, I downgraded to kernel 4.3 and I still have display issues. This time with connectors with a display port icon, but they're the wide white plugs
04:57codehotter: I don't have access to the hdmi display anymore. But I have much the same problems
04:57codehotter: strange cursor behavior, computer seems slow, no output on one of the monitors
04:58codehotter: Ah, the plug is apparently called "DVI"
04:58codehotter: xrandr thinks it's HDMI
04:58codehotter: again ,everything works fine in Windows
05:00codehotter: karolherbst: your troubleshooting procedure was, close X, then ?
05:00codehotter: Should I upgrade back to 4.4?
05:01karolherbst: ohhh, did you had this issue with non working internal display after plugging in an external one?
05:01karolherbst: or was it something else
05:06codehotter: Internal display is always working
05:06codehotter: Except for the fact that it appears sluggish and my cursor disappears unless I'm moving it
05:06codehotter: (the cursor is always visible as soon as I unplug the external monitor)
05:07codehotter: I have problems getting output on the external monitors
05:07karolherbst: RSpliet: my findings so far: https://gist.github.com/karolherbst/6280616f9678b98dcd57 does anything looks familiar to you?
05:07karolherbst: codehotter: ahh right
05:07codehotter: sometimes if I move my cursor to the external monitor they multiply and I can make cursor fireworks
05:07codehotter: it looks kind of cool
05:08codehotter: I have 3 monitors in this setup, is that an acceptable configuration?
05:08RSpliet: karolherbst: yes... well, I do believe the GDDR5 script contains more details than that
05:08codehotter: internal, and 2 external, both connected via DVI
05:08codehotter: one of them is recognized as DP-1-3 and the other as HDMI1
05:08karolherbst: RSpliet: k
05:09karolherbst: I also found some patterns in it, but nothing usefull yet
05:09RSpliet: a lot of those if statements are misleading, as there's a "only write registers that actually change" check in the SEQ script gen
05:11karolherbst: RSpliet: yeah I know, I have some more SEQ scripts I want to look at. I just wanted to start with the smaller steps to see some patterns in the values
05:11karolherbst: but I have also big reclocks
05:12RSpliet: I think you'll do yourself a big favour if you first read the current code, skipping over the if(mode == ?) parts (because they tend to contain stale static not-so-useful reg writes)
05:13RSpliet: I think "mode" was meant to have the connotation of "PLL or just postdiv", but the name "mode" is so non-descriptive that anything I wrote instead check for whether the new mode has a PLL coef
05:13karolherbst: yeah there is also a mode for kepler gddr5 which decides if you have one pll or two
05:14karolherbst: well for kepler in general I think
05:14RSpliet: the name is rubbish ;-)
05:15codehotter: Can I just ask if this setup is supposed to work and a configuration issue or if it's not supported yet? I have intel card, and GK107GLM, and I want to use my laptop display and 2 external monitors using xrandr
05:18karolherbst: it should usually work
05:18karolherbst: but as always, there can be bugs
05:20codehotter: I think I can disable the intel card in the bios. Would that help?
05:21karolherbst: codehotter: if you don't care about battery lifetime, sure
05:22codehotter: Not sure how the bios manages that if different displays are wired to different cards
05:22codehotter: but I'll try it
05:23karolherbst: RSpliet: well, nothing really fits that well yet :/
05:25codehotter: Everything works great now when I disabled "Switchable graphics"
05:25codehotter: What actually physically happens with that button? Why is it working now if it wasn't working before? Surely there's not a motorized mechanism changing the cable wirings inside the case?
05:25karolherbst: which button?
05:26codehotter: The "switchable graphics" button in the bios setup
05:26codehotter: It's now disabled
05:26karolherbst: ahh k
05:26karolherbst: well I assume the bios does some smart stuff
05:26RSpliet: karolherbst: not? your gistfile line 19 -> https://github.com/RSpliet/kernel-nouveau-nv50-pm/commit/92937716e1687ed5b15f74e6a955ea3dff717d18#diff-2245c015e8c518edce5ab5509cde189aR484
05:26RSpliet: line 484
05:26karolherbst: RSpliet: yeah some writes are there on both sides
05:27karolherbst: but I would say it fits around 25%
05:27RSpliet: well, a) you're missing some of the writes NVIDIA does outside the SEQ script
05:28karolherbst: RSpliet: but that's only before and after, right?
05:28RSpliet: b) I'm fairly sure that's an underestimation; it's likely to be closer to 75%
05:29mwk: hehe, quadop instruction with src1 coming from s hangs the GPU
05:31mwk: huh, quite permanently on G98
05:31karolherbst: so real dead now?
05:31mwk: PGRAPH reset didn't help
05:32mwk: ok, rebooting fixed it
05:32mwk: perhaps I just missed something in my setup sequence
05:32mwk: it's far from a proper initialization after all
05:33KhazAkar: It is normal when Nouveau have better performance even 2 times than nvidia blob? Quadro NVS 160M :D
05:33karolherbst: mhh, is the rendering the same?
05:33KhazAkar: Checked with glxgears. On blob ~600 FPS,on Nouveau - ~1.2k
05:34RSpliet: glxgears is not a benchmark
05:34karolherbst: KhazAkar: prime offloading?
05:34karolherbst: or was it just glxgeasr?
05:34KhazAkar: I don't have second graphics card on my Dell,only Quadro
05:34karolherbst: but yes, ignore the result from glxgears
05:34karolherbst: better run unigine heaven
05:35KhazAkar: Firstly,I need to download it and install :P
05:37KhazAkar: Ignore FreeCAD too?Similiar performance,sometimes Nouveau better,but on closed drivers Wings3D works better... :P
05:37karolherbst: try unigine ;)
05:37karolherbst: but I am not sure if it will run at all :/
05:37karolherbst: I don't know what it requires
05:38KhazAkar: OpenGL 4.0
05:38KhazAkar: DX9/DX11 or OpenGL 4.0
05:39karolherbst: for tesselation yes
05:39karolherbst: but it will also run on OpenGL 3 gpus
05:39karolherbst: but I don't know what you get with yours
05:41KhazAkar: My graphics card is on PCIe slot in my notebook,so mayby change it to better? :D
05:46KhazAkar: Or use eGPU via ExpressCard :>
05:48KhazAkar: On blob blender works slower,wth?
05:57mwk: hmm, quadop with dx/dy mode doesn't work at all in compute mode, it seems
05:57mwk: always gives 0
05:58mwk:looks at the funny unknown methods
06:02RSpliet: mwk: out of curiousity; are there any ops that might behave differently depending on the grctx? like, bits that might represent a DX vs. OpenGL "mode"?
06:02mwk: RSpliet: I haven't tested that yet
06:02mwk: but I'm expecting some
06:03mwk: there should be a bit controlling 0*NaN behavior somewhere
06:03mwk: with old assembly shader languages, that was defined to be 0
06:04mwk: which is rather important since there were no branches and you did conditions by doing shit like (a < b) * c + (a >= b) * d
06:08RSpliet: hmm, I reckon those bits could equally well hide in the program launch command
06:11mwk: yep, found them
06:11mwk: there's a method for that
06:21mwk: another one bites the dust
06:30karolherbst: nice one
06:49mwk: RSpliet: btw I'm also expecting at least the quadop insn to work differently in pixel shaders than other types
06:50mwk: and fwiw the tesla "launch command" takes no parameters
06:51mwk: the program control blocks in memory only started appearing on fermi
07:28Tom^: karolherbst: sup.
07:28Tom^: karolherbst: ok time to figure out how to pull in these commits into this branch.
07:28Tom^: karolherbst: or wait, "i need more, ping when i got time."
07:28Tom^: karolherbst: ping.
07:32karolherbst: Tom^: https://github.com/karolherbst/nouveau/commits/tom
07:32karolherbst: ohh wait
07:33Tom^: nothing new there
07:33Tom^: mmh :p
07:33karolherbst: but try it out with debug=clk=trace, so that we have the data
07:34karolherbst: and it might be that memory won't be reclocked high enough sometimes
07:34karolherbst: but that's more because the pmu memory counter is somehow bad
07:35Tom^: you are also quite behind skeggs according to github
07:36Tom^: "52 commits behind skeggsb:master." :P
07:36karolherbst: ahh he pushed
07:37karolherbst: mhhh noi
07:37karolherbst: that's okay
07:37karolherbst: doesn't matter
07:47Tom^: karolherbst: yea seems to work rather nicely now. now you just got to figure out the flicker from changing cstate :P
07:47Tom^: karolherbst: quite annoying when it jumps ~10 times or so. instead of when i manually set it one time xD
07:48Tom^: karolherbst: it changes cstates, seems to detect load quite nicely and only put core etc at proper levels, goes back to 07 when nothing is used etc.
07:48Tom^: so i got no complaints.
07:48RSpliet: karolherbst, Tom^: for single-monitor set-ups that's solved by syncing to VBLANK, for multi-monitor set-ups we probably need the linebuffer to keep feeding the scanout engine with pixels while memory is inaccessible
07:49Tom^: RSpliet: yea solving to vblank isnt quite a solution either
07:49Tom^: RSpliet: some people run window managers without compositors.
07:49Tom^: *syncing to
07:49RSpliet: doesn't matter
07:49RSpliet: every monitor has a VBLANK period
07:50Tom^: ah ok yea.
07:50RSpliet: which is long enough to change your memory clock 3 or 4 times ;-)
07:50karolherbst: Tom^: you should also test if you hit the 60 fps mark with most games, and if something runs slower check that the gpu has max memory clock and not maxed out core load or max core clock
07:50RSpliet: without causing flicker
07:51Tom^: karolherbst: yea did that with glxspheres, it ran core at ~600 mhz and mem at 6999 , and jumped up and down at times without what i can tell impact the mpixels count.
07:51Tom^: and unigine ramped it up to max.
07:52karolherbst: I am relieved we figured out those pmu issues, so that we can now implement dyn reclocking without hanging the driver :D
07:52karolherbst: Tom^: yeah, there is some memory stuff still missing
07:52karolherbst: Tom^: currently the pmu just sends a request when the core load is high enough
07:52karolherbst: usually when memory is too slow, the core load seems rather high, and drops after memory got reclocked
07:52karolherbst: currently such loads withh create an endless reclocking cycle
07:53karolherbst: because it selects a cstate where you need a higher pstate
07:53karolherbst: and higher pstate means higher memory clock
07:53karolherbst: but then the core load suddenly drops and memory load is too low anyway, because of the stupid pmu memory counters
07:53karolherbst: so it clocks down the pstate, because lower cstates doesn't need the highest pstate
07:54Tom^: mm ok
07:55karolherbst: currently my plan is to check the loads on the pmu and send a reclock requests when the load goes outside a specified load range and the kernel module decides to reclock or not
07:55karolherbst: on reclock it acks the requests and the pmu resets some state, otherwise the pmu will only bother with higher or lower loads (depending on the situation)
07:59Tom^: karolherbst: https://gist.github.com/anonymous/79ef19d9fe6a3a1b78eb this is from some testing if its of any value.
08:04karolherbst: not by itself
08:04karolherbst: this is just a dump what the dyn reclocking code on the kernel is doing. I bet there are a lot of smarter ways to do it though
08:04karolherbst: and faster
08:06karolherbst: Tom^: but maybe you get the feeling your fans are quieter for not so heavy workloads?
08:13Tom^: my fans was always kinda quiet on this non reference card
08:13Tom^: its some dual fan thingie :P
08:13karolherbst: ohhh k
08:14Tom^: just that its recently started to vibrate a bit in one of them. so im thinking about ordering some water cooling setup. but im not sure its so expensive :<
08:14karolherbst: maybe you just want to tighten the screws then
08:20karolherbst: RSpliet: I think I got now all possible combinations of the reg writes from the SEQ scripts possible on my card. And I hope the nouveau code is really not that far away from that :)
08:20karolherbst: maybe just some branches are missing
08:21RSpliet: for GDDR5 I know that the main missing bit is perfect clock reg config. I started on it, but then lost the GPU before testing
08:22RSpliet: and a few minor unknown values for random registers, but I think most bases are covered (and if they aren't, they depend on whether it's PLL driven or postdiv)
08:22RSpliet: for DDR3 it's a different story, and the first surprise I saw in your trace was 0x100224
08:22RSpliet: *cough* 0x10f224
08:24karolherbst: ohh wait, I update my paste
08:24RSpliet: no worries, I don't have time to look into it in detail anyway
09:27stephane_: Hello. Linux user for 19 years (Red Hat, Mandrake, Debian, Ubuntu, Fedora, Mint). On a laptop with Mint 17.2 (same observed on Fedora live USB), GM107M [GeForce GTX 850M] + Optimus, shipped with nouveau driver 1:1.0.10-1ubuntu2. nouveau crashes at system startup about 2/3 of the time. http://linuxmint.com/rel_rafaela_cinnamon.php says to install proprietary driver. Alas, nVidia driver freezes video playback (Ctrl-Alt-F1 then back w
09:27stephane_: orks around). Switched through nVidia GUI to Intel GPU which mostly works but system sometimes lags, dmesg shows warnings with stack traces. Any reference (bug tracker) about nouveau, why it hangs on startup, is it fixed in git? Thanks.
09:28imirkin: stephane_: what kernel?
09:29imirkin: stephane_: while nouveau isn't going to be immensely useful to you with a GM107, it shouldn't crash
09:29stephane_: kernel 3.16.0-38-generic
09:29imirkin: stephane_: also... 19 years? oh hm. that seemed like a big number, but then i realized i've also been using linux for 19 years.
09:30stephane_: welcome to the club :-)
09:30imirkin: time flies :)
09:30stephane_: linux-image-3.16.0-38-generic 3.16.0-38.52~14.04.1 amd64
09:30imirkin: well, 3.16 should contain no support for acceleration on the GM107. not 100% sure there's *any* support for it tbh.
09:30stephane_: "time flies like an arrow, fruit flies like an apple" ;-)
09:30imirkin: what makes you say that nouveau is having issues
09:30imirkin: mmm.... i've always heard it with a banana...
09:31stephane_: Observed crashed on startup. http://linuxmint.com/rel_rafaela_cinnamon.php
09:31imirkin: i don't see a backtrace or any concrete info there
09:31imirkin: perhaps i skimmed it too fast
09:31stephane_: stack not there
09:31mwk: huh, G80 supports fp16 denormals
09:31mwk: that's a surprise
09:31stephane_: just acknowledgement of the bug
09:31imirkin: mwk: via cvt?
09:32mwk: imirkin: yup
09:32imirkin: stephane_: well, that's a pretty ancient kernel
09:32mwk: not like there's any other instruction that knows about f16
09:32imirkin: stephane_: if you're still having issues with kernel 4.4, we can see what's going on
09:32imirkin: stephane_: it's unlikely you'll find people interested in debugging a kernel released 1.5y ago
09:32imirkin: mwk: yeah, i guess not :)
09:33imirkin: mwk: i should probably hook up the unpack/packHalf2x16 stuff for nv50. i only did it for nvc0...
09:35stephane_: Considering installing http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-wily/linux-image-4.4.0-040400-generic_4.4.0-040400.201601101930_amd64.deb
09:35mwk: ugh, I have to clean up the fp routines
09:36mwk: it's like the 7th time I write a normalization loop
09:38stephane_: imirkin: looking at kernel logs at the time I was using nouveau, I see "kernel: [ 3.773038] irq 16: nobody cared (try booting with the "irqpoll" option)"
09:38stephane_: Any hunch?
09:40imirkin: mwk: you should be getting good at it
09:40imirkin: stephane_: what's hooked up to irq 16?
09:41mwk: imirkin: but I'm going to copy this code straight to ISA docs and if you see 15 different normalization loops for 20 instructions, you're going to kill me
09:42imirkin: i dunno, it might actually not be horrible for isa docs to be repetitive
09:42imirkin: since when looking at them you usually want to know how *one* *specific* op works
09:42stephane_: imirkin: good question. grep 16: /proc/interrupts
09:42stephane_: 16: 30 0 0 0 3 0 0 0 IR-IO-APIC-fasteoi ehci_hcd:usb1
09:43imirkin: so... not nouveau
09:43stephane_: usb? coincidence that it appears right at nouveau init time?
09:43stephane_: imirkin: should I expect improvements by using kernel 4.4, or just more willingness to debug?
09:43imirkin: dunno, could be that the ACPI call we make is doing something funky
09:43imirkin: stephane_: it should just work on 4.4...
09:43stephane_: Ah, good thing rthen
09:44imirkin: i forget when maxwell support was added at all, i think i twas 3.17
09:44imirkin: or 3.19
09:44imirkin: and accel for maxwell was added in 4.1
09:44imirkin: that said, expect it to be slow, since we don't have any reclocking support on maxwell
09:46stephane_: imirkin: hmm, thanks for the warning. I see https://launchpad.net/~xorg-edgers/+archive/ubuntu/ppa lists 1:1.0.11+git20150401.212fc535-0ubuntu0sarvatt~trusty while current (no-ppa) update is 1:1.0.10-1ubuntu2. What about linking to this ppa and upgrading to 1.0.11?
09:47imirkin: oh yeah 1.0.10 didn't do maxwell either. but really you should just use DRI3 on your intel ddx which will allow you to switch between nouveau and nvidia at runtime if you like
09:48imirkin: unless you have screens attached to it
09:49imirkin: bbiab. for real this time.
09:49stephane_: imirkin: thanks for the tip. There is the screen laptop (1920x1080) and an external monitor 2560x1440 attached via HDMI. Can it interfere somehow?
09:49stephane_: imirkin: okay waiting for your being back
09:52stephane_: Regarding DRI3, there's another PPA on https://launchpad.net/~ubuntu-desktop/+archive/ubuntu/dri3
09:53stephane_: (If using any PPA just works that'd be enough for me.)
10:06imirkin_: stephane_: with newer laptops, usually it'll all be hooked up via the intel gpu
10:06imirkin_: stephane_: so the nvidia gpu is only for acceleration
10:10stephane_: imirkin: you mean I can probably get away with pure intel driver? If it works it's okay for me, but I thought setups were usually plugged on nVidia chip actually.
10:12imirkin_: not anymore
10:12imirkin_: but you can check
10:12imirkin_: sometimes it is
10:12stephane_: Hmm how can I check that? lspci might actually lend to what you say
10:13stephane_: Intel is listed as VGA compatible controller, and NVIDIA as 3D controller.
10:13imirkin_: usually "3d controller" means "no displays attached"
10:13imirkin_: but not always
10:13stephane_: how to test?
10:13imirkin_: pastebin your dmesg
10:13stephane_: (reminds me of the old 3dfx hacks)
10:13imirkin_: (with nouveau loaded)
10:14imirkin_: i need to look at the DCB stuff it prints
10:14stephane_: of course :-)
10:14stephane_: will have to reboot for that
10:14imirkin_: forget that
10:14imirkin_: look, 99.9% chance it's all hooked up to intel
10:14stephane_: mh okay.
10:14stephane_: So basically I can launch another X session from a plain virtual console with the intel driver and see if it works, right?
10:28redeyedman: when I doing echo 0f > /sys/class/drm/card0/device/pstate it says that function isn't implemented
10:29redeyedman: what is mean ?
10:29redeyedman: I have Fermi card 630 GT
10:30imirkin_: redeyedman: it means there's no support for this
10:30imirkin_: no reclocking on fermi
10:30imirkin_: yes. very :(
10:30stephane_: imirkin: after a failed first try (stuck to another console, had to take an old powerpc mac from a drawer to ssh into the machine and chvt 1), I made another xorg.conf swapping the "Screen0" and "Inactive" lines in section ServerLayout, and I have a running X server with a terminal.
10:31imirkin_: stephane_: you shouldn't need an xorg.conf... that way lies sadness.
10:31redeyedman: and there is no any possibiblity in the future?
10:31imirkin_: stephane_: or there should be only very minimal things in there
10:31imirkin_: redeyedman: there's always possibility :)
10:32imirkin_: someone started looking at it again recently
10:42stephane_: imirkin: 31 lines in the config file. "It seems that perfection is attained, not when there is nothing more to add, but when there is nothing more to take away."
10:42stephane_: Could be shorter by not even mentioning nvidia GPU.
10:42imirkin_: stephane_: normally 4-5 lines is sufficient. well, normally 0 lines is sufficient.
10:43stephane_: Can I select a driver with 0 line ? Been used to a minimal xorg.conf for that.
10:43imirkin_: no, but X will know what to do
10:43imirkin_: X the almighty
10:43imirkin_: this isn't XFree86 3.3.6 anymore
10:44stephane_: At boot it selects nvidia GPU and proprietary driver.
10:44stephane_: Anyway, there I have xrandr working (color LUT), two screens ok. vlc complains no xvideo, and indeed fullscreen is slow.
10:44imirkin_: you must have a bunch of configs
10:44stephane_: there was no xorg.conf before I created one to select intel.
10:44stephane_: well, actually there is none at all
10:45stephane_: I ran: startx /usr/bin/xfce4-terminal -- :1 -config xorg.conf.intel2
10:47imirkin_: stephane_: xorg.conf likes to hide nowadays
10:48imirkin_: bits of it are in xorg.conf.d directories sprinkled around your FS
10:48stephane_: It's been around 10 years already, actually.
10:48stephane_: Yeah I forgot those.
10:48stephane_: No /etc/X11/xorg.conf.d anyway.
10:48imirkin_: if no xorg.conf, i think it picks stuff from there
10:50stephane_: dpms works
10:50imirkin_: well i bet you also have a pretty new intel chip, perhaps that's not supported by your stack either :)
10:51stephane_: Laptop bought around march 2015.
10:51imirkin_: what cpu is it?
10:51stephane_: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz 2 cores.
10:51stephane_: No, 4 multithreaded cores.
10:51imirkin_: ah ok, haswell - you're fine
10:51stephane_: ok thx
10:51imirkin_: if it had been -5xxx it'd be broadwell in which case you might have needed some updates
10:52stephane_: Thanks for the precision.
10:52imirkin_: (or -6xxx, which is skylake, but that wasn't out in march 2015)
10:52stephane_: https://wiki.archlinux.org/index.php/Intel_graphics#Configuration mentions a 4-lines /etc/X11/xorg.conf.d/20-intel.conf . I guess it should be fine.
10:53stephane_: Do all distrib look into /etc/X11/xorg.conf.d/ (Mint, here)
10:53imirkin_: normally yeah
10:53imirkin_: i don't know anything about mint in particular
10:53imirkin_: but as long as they don't do anything _too_ crazy...
10:53stephane_: Well, it's a thin layer on top of Ubuntu anyway.
10:53imirkin_: i don't know anything about ubuntu in particular either ;)
10:53imirkin_: you name it, i don't know it
10:53stephane_: Well, a layer on top of Debian, and ... ;-)
10:54stephane_: By the way, sorry for Ian.
10:54imirkin_: for the past 19 years i've used slackware, then gentoo
10:54imirkin_: with very short redhat and suse stints
10:54imirkin_: early in that timeframe
10:54stephane_: Considered gentoo, but Ubuntu tends to just work, better than Debian in practice.
10:54imirkin_: ancient memories
10:55imirkin_: i'm a big fan of people picking what works best for them
10:55stephane_: imirkin: good philosophy. :-)
10:55imirkin_: what i'm not a fan of is people asking me about how their things work
10:55imirkin_: coz i just don't know :)
10:55karlmag: imirkin_: but Slackware seems to stick? ;-)
10:56imirkin_: karlmag: well, my (awesome) U160 quantum drive died
10:56imirkin_: karlmag: and i had to reinstall
10:56imirkin_: karlmag: and i reinstalled with gentoo, because i was sick of manually trackign down all the dependencies to compile myself
10:56imirkin_: which gentoo did in a much nicer more automated manner
10:56stephane_: (off-topic) On removing a package, can gentoo automatically remove dependencies no longer needed?
10:57karlmag: imirkin_: hmm.. right..
10:57imirkin_: stephane_: nowadays yeah... there's emerge --depclean
10:57karlmag: imirkin_: well.. I've been using slack for the past 20 years or so myself, so...
10:57imirkin_: back in the bad old days when i first started with gentoo... i don't think so
10:57stephane_: I appreciate that Debian and derivatives to it. Fedora seems not to.
10:58stephane_: "big fan of people picking what's best for them". I'm a big fan of "Short term best tool is often what you know now. Long-term best tool is often what you'll learn in the meantime."
10:59karlmag: I've been doing other distros at work though (suse, fedora and ubuntu in particular)
11:00stephane_: Funny how there's https://fr.wikipedia.org/wiki/Emerge but not in English.
11:00karolherbst: well you can always check the tree for non referenced packages though
11:03stephane_: For a quick GPU assessment, is there anything lighter than phoronix?
11:03imirkin_: stephane_: "assessment"?
11:03stephane_: I meant simple benchmark.
11:03karlmag: imirkin_: and yeah, its sad when nice hardware dies..
11:04stephane_: 15 years ago I remember an old X11-style benchmarking tool, which worked, was light, but took a ling time.
11:04imirkin_: stephane_: i like unigine heaven and unigine valley
11:04imirkin_: stephane_: x11perf
11:04stephane_: Ah yeah x11 perf. Thanks.
11:04imirkin_: stephane_: but that measures something else
11:04stephane_: Ah. :-/
11:04imirkin_: it measures... x11 perf :)
11:04imirkin_: (surprising, i know)
11:05stephane_: Well, I won't be doing 3D. Need 2D stuff, playing video.
11:05imirkin_: you say that
11:05imirkin_: but i bet you use gnome or kde or some bs like that
11:05imirkin_: which will use OpenGL on your behalf
11:05stephane_: xfce. Dropped KDE when they did 4 (what a monster) and Gnome even earlier.
11:05stephane_: I think I disabled the compositor.
11:05imirkin_: but that will all run on your intel gpu
11:05imirkin_: which should be quite capable
11:06stephane_: Ok. I guess I'll soon close this X session, to run on Intel only.
11:07karolherbst: well you might want to enable the xfce compositor anyway, I don't think there is any downside to this
11:11stephane_: karolherbst: thanks for the tip. Don't remember quite why I disabled it.
11:11stephane_: What desktop environment is popular in users of this channel?
11:12imirkin_: don't know that there is one. i use WindowMaker :)
11:12imirkin_: and no "desktop"
11:12stephane_: Wow, used WindowMaker in 1997.
11:12stephane_: Notification tool?
11:14stephane_: imirkin: regarding "that measures something else" [What is wrong with X?](http://wayland.freedesktop.org/faq.html#heading_toc_j_6) states it well.
11:15stephane_: karolherbst: Do you use any notification tool, wifi selector?
11:16imirkin_: stephane_: notification?
11:16imirkin_: stephane_: i use wmsystemtray which lets things put up little icons
11:17mwk: ugh, now I'm bogged down in cvt
11:17imirkin_: mwk: which one?
11:17mwk: apparently cvt f32 to f16 with round-to-integer is a real mess
11:17imirkin_: a common use-case, to be sure
11:18stephane_: imirkin: notifcation, when some network becomes reachable, when a new mail arrives, when music player switched to another music, ...
11:18imirkin_: stephane_: i have no such thing
11:18mwk: first I figured I'd just round-to-int on f32, then do a separate conversion
11:18mwk: but that didn't match
11:18imirkin_: mwk: for what numbers?
11:19mwk: then I tried to do it the other way around
11:19mwk: also no luck
11:19imirkin_: there's funny questions around whether things get rounded to infinity or to max_float
11:19mwk: now I made a monstrosity that does both in single rounding, and it's getting better now
11:19stephane_: Is there a general opinion (consensus) on Wayland here? Positive, negative?
11:19mwk: I have at most 1ulp of error now
11:19imirkin_: stephane_: never touched it myself
11:19mwk: imirkin_: lots
11:20mwk: want an example?
11:20imirkin_: mwk: yes :) give me like 3 examples.
11:20mwk: 0xc6ef3d6e cvt.rmi
11:20mwk: I say 0xf77a, hw says 0xf779
11:21imirkin_: what do those map to? i'm too lazy to convert the fp16 values myelf
11:21mwk: 0xc5b47458 rni, I say 0xeda4, hw says 0xeda3
11:21mwk: I have no idea, that's just what my test runner spews out
11:21imirkin_: hmmmm ok
11:22imirkin_: ok, give me a few
11:23mwk: there you go
11:23imirkin_: thanks :)
11:23mwk: rzi mode is notably absent
11:23mwk: but then rz is the nicest-behaved of the rounding modes
11:24imirkin_: mwk: ok, so
11:24imirkin_: 0xc6ef3d6d = -30622.712890625
11:24imirkin_: 0xf77a = -30624.0, 0xf779 = -30608.0
11:25imirkin_: this was with rmi, which is... what?
11:25mwk: round to -inf
11:25imirkin_: so the hw is a lying liar
11:25mwk: hw is a double-rounding liar
11:25imirkin_: perhaps the rounding direction doesn't work on f32 -> f16?
11:25imirkin_: and it always rounds to 0?
11:26mwk: it works ok when I don't round to int
11:27mwk: let's see...
11:29imirkin_: btw this is what i do to look at the numbers: numpy.uint16(0xf779).view('float16')
11:30mwk: what the.
11:30mwk: so cvt.rm results in 0xf77a, which is correct
11:30mwk: but .rmi results in 0xf779
11:30mwk: both are integers
11:31mwk: so rm should be identical to rmi
11:31imirkin_: and yet... they're not
11:31imirkin_: did you test my theory?
11:31mwk: I guess it works like mad instruction
11:32mwk: two roundings, only one rounder, one of them gets rz
11:32mwk: ok, let's try forcing the whole rounding mode to rz
11:33mwk: nope, doesn't work
11:33mwk: rpi *does* work a bit
11:34mwk: as in, smallest finite number gets converted to 0x3c00 (1.0)
11:34imirkin_: heh ok
11:34mwk: it's some bullshit like the mad insn
11:35mwk: imirkin_: note that with my single-rounding function, I get errors only on a very narrow range of exponents
11:35imirkin_: ah hm
11:36mwk: corresponding to [2**11, 2**14)
11:36mwk: er, [2**11, 2**15)
11:37mwk: funny, in that range, fp16's are always integers
11:37mwk: well, that sounds like a job for captain If Statement
11:38imirkin_: a little dirty but... meh
11:38mwk: oh wait, 2**15 overflows to inf
11:39imirkin_: i've been told that the DX10 behavior is that non-inf values go to maxfloat
11:39imirkin_: i.e. not infinity
11:40imirkin_: mwk: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/auxiliary/util/u_half.h#n38
11:41mwk: imirkin_: infinites work as usual in IEEE
11:41mwk: nothing weird there
11:42imirkin_: in IEEE, yes. but apparently DX10 had funky behaviour
11:44stephane_: Just for the record: I have closed nearly all windows and am nearly ready to close the X session with proprietary Intel driver. I see messages in dmesg
11:44stephane_: pci 0000:01:00.0: Max Payload Size 16384, but upstream 0000:00:01.0 set to 256; if necessary, use "pci=pcie_bus_safe" and report a bug
11:44stephane_: i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
11:44stephane_: Any comment?
11:44imirkin_: nothing to do with nouveau
11:44stephane_: Sure. :-)
11:44imirkin_: there is also no proprietary intel driver
11:44stephane_: Well, the nVidia driver set to Intel. :-)
11:45stephane_: As stated earlier,
11:46stephane_: It's an optimus setup. The nVidia proprietary driver offers to use NVIDIA or Intel GPU. Since NVIDIA had freezes when playing video, I switched to Intel.
11:46imirkin_: nvidia blob driver can only drive the nvidia gpu.
11:46stephane_: Well, I do observe a different behavior when Intel is selected.
11:46mwk: hmh waitaminute
11:47stephane_: No more freeze on playing video, but sometimes lags (when exiting emacs for example).
11:47mwk: the max fp16 number is ridiculously small
11:47stephane_: Also, having started a second X server appears to lag more, and even sometimes miss the Alt-² combination used to switch windows. WT?
11:47imirkin_: mwk: yeah, 64k or so
11:48imirkin_: er, 32k?
11:48imirkin_: er no. -65504
11:48stephane_: Anyway thank you for hinting me to simply use Intel driver. On another laptop (2012) it looks like it was not possible to use Intel chip alone.
11:48imirkin_: (or +65504)
11:58mwk: imirkin_: got it
11:58mwk: it's really really simple
11:59mwk: turns out my initial idea of doing 32-bit rint first was right
11:59imirkin_: and then the f32 -> f16 is rz?
11:59stephane_: imirkin: strangely I had to reboot. After closing session with proprietary DDX, I could switch to the other (free intel) session. I killed -9 -1 because I was weary of xfce clobbering my config. That switched to vt 1 but no key worked. strace chvt via ssh appeared to work but screen stuck on console 1. After a reboot I get a nice reactive X server, no lag. Thanks again.
12:00mwk: imirkin_: it's *mostly* rz, except overflows return Inf instead of max finite (as would be usual for rz)
12:00mwk: or... scratch that
12:01mwk: overflows behave as usual for the rounding mode (rm gives -Inf but +maxfin), but rounding itself is like for rz
12:01stephane_: Saying this on the nouveau channel is a bit like discussing POTS telephony on an ADSL-related channel. Thanks anyway. Am probably over now.
12:02mwk: so yeah... exactly the mad deal
12:09mwk: works perfectly now
12:09mwk: f2f down
12:09mwk: ... for non-fp64 cases at least
12:11mwk: that would leave f2i and i2f, and then I'm done with fp32 instructions
12:18mwk: when I'm done with that, I'll have to make a big fat drawing of the FPU
12:19mwk: with the single damn rounder
12:21mwk: ok, let's attempt i2f now
12:30mwk:feels like the mathematician with the pot and stove right now
12:31mwk: using fp32_add(x, 0) merely to change NaNs to the canonical NaN
12:31imirkin_: probably more like the chip designer
12:36stephane_: imirkin: quick glmark2 Score (2 seconds per test): 178
12:36stephane_: Don't know how it compares with the rest of the world, though.
12:36imirkin_: stephane_: i assume you're running that on your intel gpu
12:37karolherbst: stephane_: well I use kde plasma 5, so I use the kde plasma stuff
12:37stephane_: Overall feeling is much smoother than the proprietary driver.
12:38stephane_: karolherbst: thanks.
12:38karolherbst: Tom^: do you get over 107 fps in cs:go at 2560x1600? :D
12:38Tom^: no idea, doubt it.
12:38karolherbst: stephane_: is that a desktop or laptop system?
12:38stephane_: karolherbst: laptop, why?
12:38karolherbst: just asking
12:39stephane_: karolherbst: ran benchmark on external monitor 2560x1440, don't know it it skews results in some way (probably not, I guess it computes pixels or vertices per second anyway).
12:41karolherbst: Tom^: gputest furmark at 2560x1600?
12:42karolherbst: Tom^: please do the benchmark there :D I just want to see the result
12:42Tom^: means removing blob and rebooting, and im going to bed. i can tomorrow
12:43karolherbst: there is a new benchmark at phoronix and these are easy to compare with radeon :D
12:46karolherbst: mhh I am close to the HD 6870
12:46karolherbst: which has like 33% more GFLOPS than mine :D
12:48Yoshimo: i tried to test the fermi branch, does it work by default or does it need any kind of boot parameters karol?
12:50karolherbst: ohh the fermi github branch?
12:50karolherbst: or just those patches?
13:01karolherbst: imirkin_: slowly I get the feeling, that the nouveau compiler is pretty fast compared to the radeon one :O
13:07karolherbst: I mean, I just ran a test and I was faster than a radeon card with faster memory and core :O
13:07karolherbst: imirkin_: or how would you compared the HD 6870 against a gtx 770m?
13:07imirkin_: well, nvidia hw might have something to do with it too
13:07imirkin_: well, HD 6xxx is a full generation older
13:08karolherbst: yeah, but mine is a mobile chip
13:08imirkin_: all these things come in 100 diff configurations
13:08karolherbst: half speed of R9 285
13:09imirkin_: HD 6870 would also use r600
13:09imirkin_: not radeonsi
13:09imirkin_: the compiler is different (not llvm)
13:09imirkin_: and the arch is quite frankly very odd
13:10karolherbst: the R9 285 should have around 3290 GFLOPS, which would somehow fit then
13:11karolherbst: then the perf per "GFLOP" is the same
13:11karolherbst: still amazing
13:12mwk: ugh, f16 is a fucking disaster
13:12mwk: i2f is doing crazy shit on u32 -> f16 conversion
13:13imirkin_: that's supported?
13:13mwk: according to envydis, it is
13:13mwk: ... I might be removing it from here in short order
13:13mwk: it *somewhat* works
13:14glennk: karolherbst, furmark is pretty much a fill rate benchmark on modern-ish cards
13:14karolherbst: ohh right
13:21gryffus: karolherbst: i have some time for testing the Fermi reclock and here are some results: http://susepaste.org/16800100
13:22mwk: imirkin_: okay, i2f u32 -> f16 is a disaster
13:22mwk: apparently hw sometimes forgets to check some MSBs of the u32 source to detect overflow
13:22imirkin_: mwk: is there a u16 -> f16 too?
13:22karolherbst: gryffus: so basically the same as on my fermi: around 25%
13:23karolherbst: maybe less on your
13:23imirkin_: or is that only on nvc0?
13:23mwk: and how many MSBs it forgets to check depends on the MSB of the low 16 bits
13:23gryffus: It's pretty stable, the benchmark only freezed Xorg in about 1-2 tries of 10
13:23mwk: imirkin_: there is u16 -> f16 and it appears to work perfectly
13:23karolherbst: gryffus: yeah this can have various issues
13:23karolherbst: but thanks for testing
13:23gryffus: karolherbst: i will try to get some kernel output ASAP
13:23gryffus: no problem
13:24karolherbst: gryffus: yeah, would be nice to know what's going on when it fails
13:24karolherbst: but I think this is just a simple reclocking missconfiguration
13:24karolherbst: I don't think most of the stuff is yet verified, so it's already a good to know, that it somehow works
13:25karolherbst: meehhh, all those games
13:25karolherbst: I need a bigger HDD :/
13:25gryffus: yeah, TBH i was impressed how smoothly it did go
13:25gryffus: i was expecting crashes right after reclocking
13:25gryffus: but no :)
13:25karolherbst: 180+240+130GB, just for my various steam libraries :/
13:27gryffus: karolherbst: yeah as i said... Steam is a crap :X
13:27karolherbst: it isn'T steam fault
13:27karolherbst: bioshock infinite just needs like 42GB alone :/
13:28karolherbst: maybe I should convert my steam partiiton to btrfs and do compression :/
13:29gryffus: Well, Serious Sam 3 and GW2 are also not so small... Especially when you keep the isos...
13:29gryffus: that's granted
13:29karolherbst: ss3 is small
13:30karolherbst: 5.2GB here
13:30karolherbst: that's like nothing
13:31gryffus: well it's my 4th biggest
13:31karolherbst: lucky you
13:31karolherbst: its maybe my 20th biggest
13:31gryffus: i mainly play games 10+years old :x
13:32gryffus: GW2, WoTLK, SS3, Mafia 2 are my biggest :))
13:32gryffus: btw, years ago i have migrated to btrfs for my home
13:33gryffus: and it was one of the dumbest things i have ever done
13:33gryffus: when it's low on space, it's HORRIBLY slow even with disabled compression
13:35gryffus: and even with a free space, it's just slow :( maybe i have some extreme fragmentation from torrent usage, dunno, but it's terrible :x
13:48mwk: imirkin_: fun.
13:48mwk: so, here's how i2f works
13:49mwk: take the 32-bit input value
13:49mwk: if you're converting s32, take abs() of it instead
13:49mwk: let x be the number of top 0 bits in the low half of that 32-bit word
13:50mwk: set top min(2+x, 7) bits of that word to 0.
13:51mwk: and, only if rounding mode is round to nearest or round up, check if bits 5..29 are all 1. if so, change the whole word to 0.
13:51imirkin_: mwk: so basically the high bits are bogus?
13:51mwk: basically, the high bits are sometimes ignored
13:52mwk: but it only happens if you'd have gotten an Inf anyway
13:52mwk: I guess that counts as a hw bug...
14:25stephane_: imirkin: thanks again. My PC feels less laggish now. The key information you gave me was that it can run on free intel driver. That was contrary to my previous experience and hints from Mint. Kudos.
14:27karolherbst: stephane_: what kind of setup did you had before? modesetting intel, nvidia offloading?
14:27karolherbst: or intel deactived through vbios
14:49mwk: fucking hw bugs.
15:09stephane_: karolherbst: I don't know the details. Installed NVIDIA proprietary driver. Use graphical settings manager to switch to Intel.
15:10stephane_: karolherbst: also, bumblebee is not installed, but nvidia-prime is.
15:11karolherbst: ahh nvidia-prime
15:24mwk: okay, got the simple case of f2i working
15:24mwk: now let's add some modifiers and ruin my day.
15:36mwk: unbelievable, no weirdness
15:52mwk: so... all fp32 instructions covered by the testsuite
15:53mwk: I guess now I get to clean that mess I made of my code and write the docs
15:53imirkin_: and then the G80 will have the best isa docs of any gpu :)
15:54mwk: and then you notice float instructions aren't all that interesting and you'd rather have the control flow hellstructions documented
17:54imirkin: Tom^: there's an off chance something i pushed will get you another point in heaven :)
18:55gnurou: skeggsb: sent a little fix for GM107 LTC - I hit some timeouts on GM206 when rendering 3D
19:05skeggsb: gnurou: thanks
19:17skeggsb: gnurou: what bit was set for that long?
19:18gnurou: skeggsb: uh, lemme check...
19:24gnurou: skeggsb: bit 4. when set it instructs LTC to invalidate only clean lines - it is set by default
19:29skeggsb: ah right, i did indeed miss that. that bit seems new in gm20x (or, it's just not enabled by default earlier)
23:17gnurou: skeggsb: you're right, this bit is new in gm20x