02:36 huehner: imirkin: hi the dcb docs published by nvidia have an entry for that unknown connector 70:
02:36 huehner: 0x70 = Virtual connector for Wifi Display (WFD)
02:36 huehner: and i thought that thought my non-working hdmi was that 0x70
05:42 huehner: imirkin: i got hdmi working :)))
05:45 pmoreau: huehner: Nice! What did you had to change?
05:45 huehner: problem was we did not detect hdmi as connected
05:45 huehner: trouble was the i2c port was marked as unsued in the dcb table parsing
05:45 huehner: let me look up the code as i do not understand what that line should be doing anyway
05:47 huehner: in nouveau/nvkm/subdev/bios/i2c.c function dcb_i2c_parse
05:47 huehner: inside the if 0x41 -> the nv_ro32
05:47 huehner: is reading 32bit from bios which is a single dcb communication control block entry
05:48 huehner: however that & 0x800000000 i don't get
05:48 huehner: what should that be doing ?
05:49 pmoreau: You check that the entry at ent is activated I guess
05:49 huehner: but how? can that condition match 32bit value?
05:50 pmoreau: & does a bitwise AND
05:50 huehner: yes and the constant has only 1's way out of range of a 32bit value read by the nv_ro32 isn't it?
05:51 huehner: as fruther down in that case block there are more functions checks matching the dcb-docs nvidia published
05:51 pmoreau: So, if the 32bit value as bit 31 turned on, the result of `nv_ro32(bios, ent) & 0x80000000` will be equal to 0x80000000, 0x0 otherwise
05:53 huehner: need more coffee first ;) sry
05:53 pmoreau: No problem! :D
05:53 huehner: but anyway i think that condition is not completely fine
05:53 huehner: from dcb spec: two 5bits fields left-moste
05:54 huehner: each value 0x1F means deactivated
05:54 huehner: and i see value 10000048
05:54 huehner: which means that check marks unused, but incorrectly (according to spec)
05:55 huehner: let me attach a experimental patch to my bug-report from yesterday
05:55 huehner: and lets someone who wrote that code take a look
05:55 huehner: probably recent only as dcb version 0x41 is i think very new (maxwell)
05:56 huehner: hdmi lights up but has at least 1 bug
05:56 huehner: a vertical line 1 pixel wide at top left: in pink color
05:57 pmoreau: I think I already heard but some similar bugs
05:57 pmoreau: s/but/about
05:57 huehner: + resolution seems off also
05:57 huehner: but that maybe as i have also vga/dvi-i connected
05:58 pmoreau: Dunno
05:59 huehner: resolution on monitor says its 1920x1080 so as i should, just see black areas to down+right matching the 'extra resolution' compared to vga
05:59 huehner: but fine
05:59 huehner: i can see something on my screen without secondary vga needing to be plugged in :)
05:59 huehner: so i'm happy for now :D
05:59 pmoreau: :)
05:59 pmoreau: Let's see for that check
06:00 huehner: ot: i think i did not reboot so often in several month than today
06:00 huehner: any better way to debug those things?
06:00 huehner: what i did is kind of manually dig through code from 'connection detection' -> i2c missing to 'i2c bios parsing' adding nv_warn's all the way to trace code-flow
06:01 pmoreau: This is usually what I also do when I debug in parts I don't know yet
06:04 huehner: oks... which are all for me
06:05 pmoreau: `dump_stack()` can come handy to find out who called this function
06:21 huehner: and basic X working :)
06:21 pmoreau: Eh eh eh!
06:21 huehner: just treating gm206 and earlier revs
06:21 huehner: same as i did yesterday in kernel
06:22 huehner: that is kind of rewarding after staring hours and hours at the i2c/hdmi stuff
07:07 imirkin: huehner: re 0x70 -- i thought i pasted the same thing in the chan last night :)
07:07 imirkin: glad to hear you got hdmi working... send a patch, that should start a discussion, whether it's the wrong or the right way
07:08 huehner: imirkin: 0x70 maybe you told me but then i didn't really get the message
07:08 huehner: imirkin: pathc is already on the list together with trivial xorg-ddx patch to add 0x120 also
07:09 imirkin: awesome... behind on my emails :)
07:10 imirkin: huehner: you should add a Signed-off-by: ... for kernel patches
07:10 imirkin: (take a look at the doc that explains what S-o-b implies... but it's basically "i didn't do anything bad")
07:11 huehner: at least not intentionally
07:11 imirkin: by bad i mean illegal
07:11 imirkin: not bad as in wrong code
07:11 huehner: clear
07:11 huehner: ot: lucky nvidia published that dcb specs, without that would have been kind of impossible for me to understand any of that code
07:12 imirkin: :)
07:12 imirkin: imagine it for the people who wrote the code before those docs were published :)
07:13 huehner: i just did... i owe you guys some beer in case we meet some day
07:34 pstglia9874: Hi everybody, how are you doing? Can you tell which situations I can receive this error messages from nouveau kernel driver?
07:34 pstglia9874: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 6 [.harism.effects[4883]] get 0x00200c8d8c put 0x00200d9c9c ib_get 0x0000000b ib_put 0x0000000e state 0x80006c30 (err: INVALID_CMD) push 0x00400040
07:34 pstglia9874: nouveau E[ PFIFO][0000:01:00.0] DMA_PUSHER - ch 6 [.harism.effects[4883]] get 0x00200c8d8c put 0x00200d9c9c ib_get 0x0000000b ib_put 0x0000000e state 0x80006c30 (err: INVALID_CMD) push 0x00400040
07:35 pstglia9874: wrong tile_mode/flags/pitch when creating a bo can cause this?
07:35 imirkin: pstglia9874: i've seen this before, unfortunately i have no *clue* where that invalid command is coming from...
07:35 imirkin: most importantly, that command is not invalid in the first place
07:35 imirkin: which means it's some other thing going wrong
07:39 pstglia9874: We're trying to do debugs (Android-x86 project - trying to make nouveau work on it - can boot and run some apps, but still there's bugs). Based on a debug made by a forum member, problems occurs on these egl calls:
07:39 pstglia9874: eglSwapBuffers
07:40 pstglia9874: glDrawArrays, glGetAttribLocation, glGetUniformLocation, glUniformMatrix4fv, glUniform3fv
07:40 imirkin: pstglia9874: which gpu?
07:40 pstglia9874: Geforce 210 (Fermi)
07:40 imirkin: pstglia9874: also if you have an application that reliably reproduces this, i'd be very interested
07:41 imirkin: it's either fermi, or it's a GeForce 210
07:41 pstglia9874: Ooops, Tesla :)
07:41 imirkin: there were a few diff GT 210's made (G84, G98, GT218), none fermi :p
07:42 pstglia9874: We can reproduce it using and app called "android_effects" from harism: https://github.com/harism/android_effects
07:42 imirkin: anyways, if you search bugzilla for 00400040 or 00406040 you'll see a bunch of reports
07:43 imirkin: btw, can i also assume you're running a recent kernel?
07:43 imirkin: or is this one of the funny 3.4 kernels or something?
07:43 pstglia9874: 3.18.2
07:43 imirkin: k
07:44 imirkin: hrmph... if you could make a version of this for linux that still repros the issue, that'd be awesome
07:47 pstglia9874: Can try to ask someone on group. However, we can enable similar debugging messages as Linux (Mesa, drm, etc).
07:47 imirkin: i don't want debug messages
07:47 imirkin: i want an application i can play with ;)
07:47 imirkin: i have no idea where to even start debugging it
07:48 imirkin: so i couldn't really provide instructions on how to debug things
07:49 pstglia9874: Understand. What I could send you is a link for one of iso's we created (Android-x86), so you could boot on your hw
07:49 imirkin: yeahhhh.... you could. it's unlikely i'd get around to doing that though.
07:50 imirkin: android is a dog to use, esp for debugging, kernel modification, etc
07:51 imirkin: anyways... that 4000040 thing is some object binding command
07:51 imirkin: i think
07:53 pstglia9874: No prob. We'll try to compile a linux app (maybe based on this harism) and see if we can reproduce that. Meanwhile, I'll keep playing on Android side based on your hints (4000040 / 00406040)
07:53 imirkin: although hrm
07:53 imirkin: i guess it means "method 0x40", which isn't a thing afaik
07:54 imirkin: so it could be some channel desync thing... although at that point, why is it always method 0x40 in the errors? i've never seen an invalid_cmd thing like that for any other method
07:55 pstglia9874: Tks for the explanations. In case you want to see the progress so far: https://drive.google.com/file/d/0BxO6THtB865fVkdLam9MM0p1YVk/view?usp=sharing
07:56 pstglia9874: We can boot, play some opengl apps, whatch streaming (youtube)
07:56 imirkin: pstglia9874: two thoughts... #1 -- if you can capture an apitrace that triggers the issue when replayed, that'd be very helpful
07:56 imirkin: pstglia9874: also if the application uses multiple GL contexts from multiple threads, that won't work with nouveau right now
07:57 mlankhorst: I have patches that could make it work :P
07:57 mlankhorst: I just need to toy with an app that needs it..
07:57 mlankhorst: haven't found one yet
07:57 imirkin: mlankhorst: the threading stuff? yeah
07:57 imirkin: mlankhorst: iirc i had some comments on those patches...
07:57 mlankhorst: it's not complete
07:58 imirkin: but it makes catastrophic failure a lot less likely
07:58 mlankhorst: I wish I had android fences..
07:58 imirkin: since at least the pushbufs are kept separate :)
08:01 pstglia9874: We could try those patches. Are they available on github or elsewhere?
08:02 mlankhorst: tbh give me a testcase and I'll take a look..
08:05 pstglia9874: Have an apitrace got from https://github.com/harism/android_effects: https://groups.google.com/group/android-x86/attach/cbfa1b42190654de/GT120_fi.harism.effects.trace?part=0.4&authuser=0
08:06 imirkin: pstglia9874: does replaying it cause the same errors?
08:06 pstglia9874: yes
08:08 imirkin: bleh, doesn't seem to replay here at all :(
08:08 imirkin: 13: warning: unsupported eglSurfaceAttrib call
08:13 pstglia9874: When you say replay you mean I should reproduce the same error every attempt of running the same app or anything else?
08:15 imirkin: well, for me it doesn't really do anything... i think some sort of egl difference... dunno
08:17 pstglia9874: All right. Thanks for infomations and help. Will try to get more info and a testing case (on Linux) where I can reproduce this. bye!
09:23 mlankhorst: imirkin: http://paste.debian.net/154964/ does this look sane to fix a very rare race condition? :P
09:25 mlankhorst: I have no idea how someone sane would hit it, but it exists!
09:25 imirkin: mlankhorst: can you explain what the condition is?
09:26 imirkin: i should probably open up nouveau.c and see the surrounding context :)
09:26 imirkin: basically a race in deleting and ref'ing a bo?
09:27 imirkin: how could it happen? bo is owned by the device, and both of those operations happen under device lock
09:27 mlankhorst: sort of, unref does not
09:28 mlankhorst: so if a refcount was 1, drops to zero, another thread revives it before the lock is taken, then releases it
09:28 mlankhorst: 2 threads could end up freeing the bo :p
09:28 mlankhorst: or other bad things
09:29 mlankhorst: mostly came up in a discussion from intel-gfx some time ago that not even nouveau's bo stuff was completely thread-safe
09:30 imirkin: mlankhorst: which specific 2 functions would race against one another?
09:30 imirkin: (i.e. what api entrypoints)
09:30 imirkin: [i'm not saying you're wrong... just trying to understand]
09:30 mlankhorst: nouveau_bo_ref unreffing and any code that revives
09:30 imirkin: k
09:31 imirkin: so 2x nouveau_bo_ref being called at the same time to unref a bo with refcount == 2?
09:31 mlankhorst: no
09:32 mlankhorst: nouveau_bo_ref dropping a bo to zero, and something like an import function 'reviving' the bo
09:32 imirkin: oh, that race
09:32 imirkin: yes, that's definitely one we were aware of
09:33 imirkin: hm, do we care?
09:33 mlankhorst: shrug, I sort of do :P
09:33 imirkin: fair enough
09:33 mlankhorst: just because I end up having to fix any code with race conditions in the end anyway :/
09:34 imirkin: hehehe
09:36 imirkin: so, the refcnt is dropped without any lock
09:36 mlankhorst: yeah
09:36 imirkin: but bo_del actually takes the lock to do its work
09:36 mlankhorst: indeed
09:37 imirkin: so you're trying to deal with the situation where the refcnt is dropped, and bo_del is in the middle of doing things
09:37 imirkin: when someone else refs the thing?
09:37 mlankhorst: and then unrefs it again :p
09:37 mlankhorst: it's prevented by having the revive wait for bo_del to finish
09:37 imirkin: seems like a common use-case :p
09:38 imirkin: but wait, i thought that even without the second unref, things were fubar'd
09:38 imirkin: i.e. if you ref while bo_del runs, you're screwed
09:38 mlankhorst: no i prevented that with the atomic_read there
09:38 imirkin: and the subsequent unref is just icing on the cake
09:39 mlankhorst: if the buffer is shared (nvbo->name) it handles things slightly different
09:40 imirkin: ugh
09:40 imirkin: right
09:40 imirkin: that stuff's all just crazy
09:40 imirkin: this is giving me a headache
09:50 mlankhorst: :p
12:25 pmoreau: How does one convert an drm_device to a nice nvxx_something_priv that can be passed to nv_wr32?
12:30 imirkin: pmoreau: you can't
12:30 imirkin: er
12:30 imirkin: i guess you can do like drm_device->device or osmething
12:31 imirkin: that should be a nv_device no?
12:31 pmoreau: Seems to work :)
12:31 pmoreau: Thanks!
13:19 RSpliet: skeggsb: is the intention ramnv50 -> ramg80?
13:19 imirkin: i think nv50 was the internal name used for g80
13:38 AAA: hey guys, I made a stupid mistake in yaboot.conf and now I can't get the system to boot. I'm hoping someone can steer me in the right direction.
13:39 AAA: I commented out the root="UUID=foo" and replaced it with root=/dev/sda2
13:40 AAA: I've tried giving it the root= and boot= stuff from yaboot, but it still refuses to boot. any ideas would be great
13:46 imirkin: dunno, never used yaboot -- you should try a support channel for yaboot or perhaps a more general linux one
14:22 AAA: thanks. fwiw, I figured it out. the ext4 module wasn't being loaded in the initrd.img. once I loaded that I could mount and fix.
15:33 buhman: AAA: :D
15:33 buhman: AAA: though I'm curious how that's related to nouveau ;p
15:53 AAA: buhman: it's not, really. just that I'm trying to compile a kernel and nouveau driver for my ppc using your wiki